Chapter 1
Semantic Introduction to Databases

Database From data and base (adj = low, mean, vile, etc). A place where data can be lost in a structured manner.
DBMS
(Database Management System)
The software needed to set up highly complex inter-relational data structures, so that files can be lost in any convenient sequence (e.g. Index before data; First-in-last-out).
From a folklore dictionary.

This chapter defines fundamental concepts of databases. These concepts are described here in terms of the Semantic Binary Model (SBM) of data. A data model is a convention for the specification of the logical structure of real-world information. The cornerstone of the contemporary theory and technology of databases was the development of the Relational Data Model. The recent development of the new generation of data models - the semantic models - offers a simple, natural, implementation-independent, flexible, and nonredundant specification of information. The word semantic means that this convention closely captures the meaning of user's information and provides a concise, high-level description of that information.

SBM is one of several existing semantic models. The various semantic models are roughly equivalent and have common principles, even though they somewhat differ in terminology and in the tools they use. SBM is simpler than most other semantic data models: it has a small set of sufficient tools by which all of the semantic descriptors of the other models can be constructed. After mastering SBM, a systems analyst may wish to explore more complex semantic models.

This chapter defines and discusses the concepts of a database, a database management system (DBMS), a database schema, modeling real- world information, categorization of real-world objects, relations between objects, graphic representation of database schemas, integrity constraints, quality of database schemas, subschemas, userviews, database languages, services of DBMS, and multimedia databases.

1.1. Databases, DBMS, Data Models

General-purpose software system:
a software system that can serve a variety of needs of numerous dissimilar enterprises.
Example 1-1
A compiler for a programming language.

Application:
a software system serving the special needs of an enterprise or a group of similar enterprises.
Example 1-2
The registration of students in a university.

Application's real world:
all the information owned by and subject to computerization in an enterprise or all such information which is relevant to a self-contained application within the enterprise.
Example 1-3.
The examples of this text constitute a case study. Its application world is the educational activities of a university. The information contains:
Database:
an updatable storage of information of an application's world and managing software that conceals from the user the physical aspects of information storage and information representation. The information stored in a database is accessible at a logical level without involving the physical concepts of implementation.
Example 1-4
Neither a user nor a user program will try to seek the names of computer science instructors in track 13 of cylinder 5 of a disk or in "logical" record 225 of file XU17.NAMES.VERSION.12.84. Instead, the user will communicate with the database using some logical structure of the application's information.

Normally, a database should cover all the information of one application; there should not be two databases for one application.

Database management system, DBMS:
a general-purpose software system which can manage databases for a very large class of the possible application worlds.
Example 1-5
A DBMS is able to manage our university database and also completely different databases: an Internal Revenue Service database, an FBI WANTED database, a UN database on world geographical data, an Amtrak schedule, etc.
Instantaneous database:
all the information represented in a database at a given instant. This includes the historic information which is still kept at that time.

The actual information stored in the database changes from day to day. Most changes are additions of information to the database.
Example 1-6
A new student, a new instructor, new events of course offerings.

Fewer changes are deletions of information.

Example 1-7
Historic information past the archival period; a course offering which was canceled before it was given.

Some changes are replacements:
updates; correction of wrongly recorded information.
Example 1-8
Update of the address of a student; correction of the student's birth year (previously wrongly recorded).
Hence the life of a database can be seen as a sequence of instantaneous databases. The first one in the sequence is often the empty instantaneous database -- it is the state before any information has been entered.

Database model:
a convention of specifying the concepts of the real world in a form understandable by a DBMS. (Technically, it is an abstract data structure such that every possible instantaneous database of nearly every application's world can be logically represented by an instance of that data structure.)

The following database models will be studied in this text:

The Semantic Binary Model is the most natural of the above models. It is the most convenient for specifying the logical structure of information and for defining the concepts of an application's world. In this text, the other models will be derived from the Semantic Binary Model. The Relational, Network and Hierarchical models are dominant in today's commercial market of database management systems.

1.2. Semantic Modeling

1.2.1. Categorization of objects

Object:
any item in the real world. It can be either a concrete object or an abstract object as follows.
Example 1-9
Consider the application world of a university.

I am an object, if I am of interest to the university. My name is an object. The Information Systems Department and its name "Information Systems Department" are two distinct objects.

Value, or concrete object:
a printable object, such as a number, a character string, or a date. A value can be roughly considered as representing itself in the computer or in any formal system.
Example 1-10
My name and the name "Computer Science Department" are concrete objects. The grade 70 which has been given to a student in a course is also a concrete object.

Abstract object:
a nonvalue object in the real world. An abstract object can be, for example, a tangible item (such as a person, a table, a country) or an event (such as an offering of a course by an instructor) or an idea (such as a course). Abstract objects cannot be represented directly in the computer.

This term is also used for a user-transparent representation of such an object in the Semantic Binary Model.

Example 1-11
The Management Science Department, the student of the department whose name is Alex Johnson, and the course named "Chemistry" are three abstract objects.

Category:
any concept of the application's real world which is a unary property of objects. At every moment in time such a concept is descriptive of a set of objects which possess the property at that time.

Unlike the mathematical notion of a set, the category itself does not depend on its objects: the objects come and go while the meaning of the category is preserved in time. Conversely, a set does depend on its members: the meaning of a set changes with the ebb and flow of its members.

Categories are usually named by singular nouns.

Example 1-12
STUDENT is a category of abstract objects. The set of all the students relevant to the application today is different from such a set tomorrow, since new students will arrive or will become relevant. However, the concept STUDENT will remain unaltered.

An object may belong to several categories at the same time.

Example 1-13
One object may be known as a person and at the same time as an instructor and as a student.

Example 1-14
Some of the categories in the world of our university are: INSTRUCTOR, PERSON, COURSE, STUDENT, DEPARTMENT.

Disjoint categories:
Two categories are disjoint if no object may simultaneously be a member of both categories. This means that at every point in time the sets of objects corresponding to two disjoint categories have an empty intersection.

Example 1-15
The categories STUDENT and COURSE are disjoint; so are COURSE and DEPARTMENT (even though there may be two different objects, a course and a department, both named "Physics").

The categories INSTRUCTOR and STUDENT are not disjoint (Figure 1-1); neither are INSTRUCTOR and PERSON.

Subcategory:
A category is a subcategory of another category if at every point in time every object of the former category should also belong to the latter. This means that at every point in time the set of objects corresponding to a category contains the set of objects corresponding to any subcategory of the category.

Example 1-16
The category STUDENT is a subcategory of the category PERSON. The category INSTRUCTOR is another subcategory of the category PERSON (Figure 1-2).

Abstract category:
a category whose objects are always abstract.

Concrete category, category of values:
a category whose objects are always concrete.

Example 1-17
STUDENT and COURSE are abstract categories. STRING, NUMBER, and DIGIT are concrete categories.

Many concrete categories, such as NUMBER, STRING, and BOOLEAN, have constant-in-time sets of objects. Thus, those concrete categories are actually indistinguishable from the corresponding sets of all numbers, all strings, and the Boolean values ({TRUE, FALSE}).
Finite category:
A category is finite if at no point in time an infinite set of objects may correspond to it in the application's world.
Example 1-18
The categories STUDENT, COURSE, and DIGIT are finite. The category NUMBER may be infinite.

139q Every abstract category is finite.
Example 1-19
The database has a finite size. We cannot have an abstract category POINT containing information about every point in a plane.

1.2.2. Binary relations

Binary relation:
any concept of the application's real world which is a binary property of objects, that is, the meaning of a relationship or connection between two objects.
Example 1-20
WORKS-IN is a relation relating instructors to departments. MAJOR-DEPARTMENT relates students to departments. NAME is a relation relating persons to strings. BIRTH-YEAR is a relation relating persons to numbers.

At every moment in time, the relation is descriptive of a set of pairs of objects which are related at that time. The meaning of the relation remains unaltered in time, while the sets of pairs of objects corresponding to the relation may differ from time to time, when some pairs of objects cease or begin to be connected by the relation.
Notation:
xRy means that object x is related by the relation R to object y.
Example 1-21
To indicate that an instructor i works in a department d, we write:

i WORKS-IN d

1.2.2.1. Types of binary relations: m:m, m:1, 1:m, 1:1.

1. A binary relation R is many-to-one (m:1, functional) if at no point in time xRy and xRz where y=z.

Example 1-22
BIRTH-YEAR is an m:1 relation because every person has only one year of birth:

person1 BIRTH-YEAR 1970
person2 BIRTH-YEAR 1970
person3 BIRTH-YEAR 1969
person4 BIRTH-YEAR 1965

Example 1-23
MAJOR-DEPARTMENT is also an m:1 relation, since every student has at most one major department, as in Figure 1-3.

2. A binary relation R is one-to-many (1:m) if at no point in time xRy and zRy where x=z.

Example 1-24
The relation MAJOR-DEPARTMENT is not 1:m, since a department may have many major students.

If, instead of the relation MAJOR-DEPARTMENT, we have the relation MAJOR-STUDENT between departments and students, then this relation would be 1:m, since every student can have at most one major department.

3. Relations which are of neither of the above types are called proper many-to-many (m:m).

Example 1-25
WORKS-IN is a proper m:m relation because every instructor can work in many departments and every department may employ many instructors, as in Figure 1-4.

4. A binary relation which is both m:1 and 1:m (always) is called one-to-one (1:1).

Example 1-26
If courses are identified by their names, then the relation COURSE-NAME is 1:1, meaning that every course has at most one name, and no character string is the name of two different courses, as in Figure 1-5.

Example 1-27
Suppose that in the current situation in our real world, the following is true:

This does not mean that NAME is a 1:1 relation between persons and strings. NAME would be a 1:1 relation if condition a were true at all times: past, present, and future.

5. A binary relation is proper m:1 if it is m:1 and not 1:1.

6. A binary relation is proper 1:m if it is 1:m and not 1:1.

Example 1-28
All of the types of relations mentioned in the previous example are proper.

Since the COURSE-NAME is 1:1, it is also 1:m, m:1, and m:m. Since this relation is proper 1:1, it cannot be proper 1:m, proper m:1, or proper m:m.

1.2.2.2. Categories as domains and ranges of relations

Domain and range of a binary relation:

Domain of relation R:
a category C that satisfies the following two conditions: a. Whenever xRy, then x belongs to C (at every point in time for every pair of objects) b. No proper subcategory of C satisfies condition a Range of relation R - a category C that satisfies: a. Whenever xRy, then y belongs to C (at every point in time for every pair of objects) b. No proper subcategory of C satisfies a
Example 1-29
The domain of COURSE-NAME is the category COURSE and its range is the category STRING. The domain of WORKS-IN is INSTRUCTOR and the range is DEPARTMENT.

Total binary relation:
A relation R whose domain is C is total if at all times for every object x in C there exists an object y such that xRy. (At different times different objects y may be related to a given object x.)

Note: No relation needs to be total on its domain.

Example 1-30 Although the domain of the relation BIRTH-DATE is the category PERSON, the date of birth of some relevant persons is irrelevant or unknown. Thus, the relation BIRTH-DATE is not total.

1.2.2.3. Attributes

Some binary relations are often called attributes.

Attribute:
A functional relation (i.e., m:1 or 1:1) whose range is a concrete category.
Example 1-31 Example 1-32
Last-name, first-name, and birth-year are attributes of PERSON.

1.2.3. Nonbinary relationships

Nonbinary relationships:
real-world relationships that bind more than two objects in different roles.
Example 1-33
There is a relationship between an instructor, a course, and a quarter in which the instructor offers the course.

Such complex relationships are regarded in the Semantic Binary Model as groups of several simple relationships.

Example 1-34
The nonbinary relationship of the previous example is represented in the Semantic Binary Model by a fourth object, an offering, and three binary relations between the offering and the instructor, the quarter, and the course, as in Figure 1-7.

In  general,  the  Semantic  Binary  Model  represents  any  nonbinary
relation as:

     a.   An abstract category of events.  Each event  symbolizes  the
          existence of a relationship between a group of objects.

     b.   Functional binary relations, whose domain is  category  (a).
          Each  of  those functional binary relations corresponds to a
          role played by some objects in the nonbinary relation.

Thus, the  fact  that  objects  x ,..., x   participate  in  an  n-ary
                                 1       n
relation R in roles R ,..., R   is represented by:
                     1       n

     a.   An object e in the category R'

     b.   Binary relationships eR x ,..., eR x
     -                           1 1        n n

Example 1-35.

The information about a course offered by an instructor during
a  quarter  could  be  considered  a  ternary relation between
instructors, courses, and quarters.  In  the  Semantic  Binary
Model,  we solve this problem by representing this information
as a category COURSE-OFFERING and three  functional  relations
from  COURSE-OFFERING:  THE-INSTRUCTOR,  THE-COURSE,  and THE-
QUARTER.

Instructor i has offered course c in quarter q if and only  if
there exists a course-offering o, such that:

                o  THE-INSTRUCTOR  i

                o  THE-COURSE  c

                o  THE-QUARTER  q

1.2.4. Instantaneous databases

Formal representation of an instantaneous binary database - as a set of facts, unary and binary: Unary fact - a statement that a certain abstract object belongs to a certain category. Example 1-36. (The person whose name is "Jane Howards") is a student. Binary fact - a statement that there is a certain relationship between two given objects. Example 1-37. The birth-year of (the person whose name is "Jane Howards") is 1968. Example 1-38. (The instructor whose name is "John Smith") works in (the department whose name is "Information Systems") Example 1-39. The final grade of (the enrollment of (the student whose name is "Jack Brown") in (the offering of (the course named "Basic Chemistry") by (the instructor named "Veronica Hammer") during (the Fall 1900 quarter))) is 100. Although this fact relates to the past, it is still relevant and thus is a part of today's instantaneous database. Note: In order to be in the current instantaneous database, the fact must have been explicitly or implicitly entered at some time and never canceled since.

1.2.5. Semantic binary schemas

A semantic binary schema is a description of the names and the properties of all the categories and the binary relations existing in an application's world. All the instantaneous databases under the schema should have only those categories and relations listed in the schema. The sets of pairs of objects corresponding, in the instantaneous database, to the categories, and the sets of pairs of objects corresponding to the relations, should satisfy the properties indicated in the schema. The schema should list the following properties of the categories and relations: the subcategories, the domains and ranges of the relations, and the types of the relations (proper m:m, proper m:1, proper 1:m, 1:1). Semantic Binary Model is a convention of specification of the structure of the real-world information in the form of a semantic binary schema. Technically, the model can be regarded as the set of all the possible binary instantaneous databases.

1.2.6. Schema diagrams

This section shows how schemas can be represented graphically in two dimensions. 1. In a schema diagram, categories are shown by rectangles. Example 1-40. [] STUDENT - category 2. Relations from abstract categories to concrete categories are shown inside the boxes of the domain-categories as follows: relation : range type The range is specified as a programming language data-type. (We will use the style of Pascal here.) Example 1-41. [] DEPARTMENT - category [] name - attribute of DEPARTMENT, range: String (1:m) (A department may have several names, but every name is unique.) Usually, relations between abstract and concrete categories are m:1. This is the default type of relations whose ranges are concrete categories, and it need not be explicitly specified in the schema for such relations. Example 1-42. [] PERSON - category [] last-name - attribute of PERSON, range: String (m:1) [] first-name - attribute of PERSON, range: String (m:1) [] birth-year - attribute of PERSON, range: 1870..1990 (m:1) [] address - attribute of PERSON, range: String (m:1) 3. Relations between abstract categories are shown by arrows between the categories' rectangles. (The direction of the arrow is from the domain to the range.) The name and type of the relation are indicated on the arrow. The default for the type of relations between abstract categories is m:m. Example 1-43. Figure 1-8 represents: [] DEPARTMENT - category [] INSTRUCTOR - category [] works-in - relation from INSTRUCTOR to DEPARTMENT (m:m) Figure 1-8. A m:m relation. 4. Subcategories' rectangles are connected to their supercategories' rectangles by arrows with dashes. Example 1-44. Figure 1-9 represents: [] PERSON - category [] STUDENT - subcategory of PERSON Figure 1-9. Subcategories. 5. The disjointness of categories is indicated implicitly: a. Two categories which have a subcategory in common are not disjoint. (The common subcategory does not have to be their immediate subcategory, that is, it may be a subcategory of a subcategory and so on.) Example 1-45. No two of the categories in Figure 1-10 are disjoint from each other. [] SMART-INSTRUCTOR - category [] SMART-PERSON - category [] INSTRUCTOR - category [] SMART-INSTRUCTOR - subcategory of SMART-PERSON [] SMART-INSTRUCTOR - subcategory of INSTRUCTOR Figure 1-10. Intersecting categories. b. Two categories which are subcategories of one category (not necessarily immediate subcategories) are considered not disjoint, unless otherwise declared in an appendix to the schema. Example 1-46. No two of the categories in Figure 1-11 are disjoint from each other. [] PERSON - category [] STUDENT - subcategory of PERSON [] INSTRUCTOR - subcategory of PERSON Figure 1-11. Intersecting categories. c. The other categories are disjoint from each other, unless otherwise declared in an appendix to the schema. Example 1-47. [diagram skipped] Example 1-48. A semantic schema for a university application is given at the end of this book. This will be the principal reference schema used in the examples throughout the book. 1.2.7. Abstraction of a database storage structure The physical implementation of a database is a responsibility of the DBMS. The implementation of the database should be completely transparent to the database users, including the database designers and systems analysts. Nevertheless, it may be helpful to the reader to have a general idea of how the database may be implemented. This section shows a possible idea of database implementation. This is a simplistic implementation. Efficiency is not a concern here. It would be, of course, a concern in any actual implementation. We can represent the abstract objects as integers. The DBMS will enumerate all the abstract objects. The numbers assigned to the objects will be invisible to the user, as will all the implementational details. We can represent the instantaneous database as a large table. Every row contains a fact: a unary fact (object -- category), or a binary fact (object -- relation -- object). This table can be implemented as a file. Example 1-49. The following is a fragment of a file representing an instantaneous database. The fragment consists of one unary fact and three binary facts. | object# | relation or category | object# or value | |-----------+----------------------------+----------------------+ | o21 | COURSE-ENROLLMENT | | | o21 | THE-STUDENT | o18 | | o21 | THE-OFFERING | o17 | | o21 | FINAL-GRADE | 100 | Example 1-50. | | ------- - -- | | | | The following figure shows an implementation of an | | instantaneous database for the university application. It | | specifies the instantaneous database which will be used in | | examples throughout this book. | | | | | | category | object# | relation | object# or value | +--------------+------------+----------------+----------------------+ | DEPARTMENT | o1 | NAME | Computer Science | | | o1 | NAME | CS | | | o2 | NAME | Mathematics | | | o2 | NAME | Math | | |------------+----------------+----------------------+ | | o3 | NAME | Physics | | | o4 | NAME | Arts | | |------------+----------------+----------------------+ | | o5 | NAME | Economics | | COURSE | o6 | NAME | Databases | | |------------+----------------+----------------------+ | | o7 | NAME | Gastronomy | | | o8 | NAME | Football | +--------------+------------+----------------+----------------------+ | QUARTER | o9 | YEAR | 1990 | | | o9 | SEASON | Fall | | | o10 | YEAR | 1990 | | | o10 | SEASON | Winter | | |------------+----------------+----------------------+ | | o11 | YEAR | 1990 | | | o11 | SEASON | Spring | | INSTRUCTOR | o12 | LAST-NAME | Brown | | | o12 | FIRST-NAME | George | | | o12 | BIRTH-YEAR | 1956 | | | o12 | ADDRESS | 112 Lucky Dr. | | | o12 | WORKS-IN | o1 | | | o12 | WORKS-IN | o2 | | |------------+----------------+----------------------+ | | o13 | LAST-NAME | Watson | | | o13 | FIRST-NAME | Mary | | | o13 | BIRTH-YEAR | 1953 | | | o13 | ADDRESS | 231 Fortune Dr. | | | o13 | WORKS-IN | o3 | | | o14 | LAST-NAME | Blue | | | o14 | FIRST-NAME | John | | | o14 | BIRTH-YEAR | 1950 | | | o14 | ADDRESS | 536 Orange Dr. | | | o14 | WORKS-IN | o2 | Figure 1-12. An instantaneous semantic database for the university application. | category | object# | relation | object# or value | +------------------+----------+-----------------+-------------------+ |COURSE-OFFERING | o15 | THE-INSTRUCTOR | o12 | | | o15 | THE-COURSE | o6 | | | o15 | THE-QUARTER | o9 | | | o16 | THE-INSTRUCTOR | o12 | | | o16 | THE-COURSE | o7 | | | o16 | THE-QUARTER | o9 | | |----------+-----------------+-------------------+ | | o17 | THE-INSTRUCTOR | o12 | | | o17 | THE-COURSE | o8 | | | o17 | THE-QUARTER | o9 | |STUDENT | o18 | LAST-NAME | Victory | | | o18 | FIRST-NAME | Elizabeth | | | o18 | BIRTH-YEAR | 1966 | | | o18 | ADDRESS | 100 Sun St. | | | o18 | MAJOR | o1 | | | o18 | MINOR | o5 | | |----------+-----------------+-------------------+ | | o19 | LAST-NAME | Howards | | | o19 | FIRST-NAME | Jane | | | o19 | BIRTH-YEAR | 1965 | | | o19 | ADDRESS | 200 Dorms | | | o19 | MAJOR | o4 | | | o19 | MINOR | o5 | | | o20 | LAST-NAME | Wood | | | o20 | FIRST-NAME | Michael | | | o20 | BIRTH-YEAR | 1964 | | | o20 | ADDRESS | 110 Dorms | | | o20 | MAJOR | o4 | | | o20 | MINOR | o5 | +------------------+----------+-----------------+-------------------+ |COURSE-ENROLLMENT | o21 | THE-STUDENT | o18 | | | o21 | THE-OFFERING | o17 | | | o21 | FINAL-GRADE | 100 | | | o22 | THE-STUDENT | o19 | | | o22 | THE-OFFERING | o17 | | | o22 | FINAL-GRADE | 70 | | |----------+-----------------+-------------------+ | | o23 | THE-STUDENT | o19 | | | o23 | THE-OFFERING | o15 | | | o23 | FINAL-GRADE | 80 | Figure 1-12. Continued. 1.3. Integrity Constraints Integrity constraints (synonyms: integrity law, integrity rules) - rules attached to a database in order to detect obvious user errors when updating the database. 1. Static integrity constraints - rules to detect instantaneous databases which cannot correspond to any probable state of the application's world in the past, present, or future, regardless of the database's update history. | | Example 1-51. | | ------- - -- | | | | These are some static integrity constraints in our university: | | | | | | o No one has two last names | | | | | | o Every student has at most one major department | | | | | | o First names of people are composed only of letters | | | | | | o Students may not participate in a course before they | | were born or receive a grade in a course before they | | are 15 years old. | | | | | A static integrity law can be regarded as a Boolean function from the set of all the instantaneous databases which are well-formed according to the schema or the database model. This function assigns the value false to those instantaneous databases which cannot correspond to any probable state of the application's world. 2. Dynamic integrity constraints - as above, but the domain of the function is the set of transitions between instantaneous databases, and false is assigned to highly improbable transitions between states of the application's world. | | Example 1-52. | | ------- - -- | | | | The following is a dynamic integrity constraint: | | | | | | The catalog of courses is unerasable -- a course, once | | entered, may not be removed. | | | | | | | Example 1-53. | | ------- - -- | | | | | | | | If we wish to record the sex of persons in our database and we | | are sure that nobody's sex is ever recorded wrongly (this is | | usually a dangerous and unreasonable assumption), and we | | further assume that a woman cannot become a man, then the | | following would be another dynamic constraint: | | | | | Once the sex of x is female in an instantaneous database, | | ------ the sex of x remains female in the next instantaneous | | ------ | database. | | | | | Note: (i) Very often some of the integrity constraints are captured by the schema. (ii) Integrity constraints should be distinguished from implementational restrictions: Implementational restrictions - the inability of the database to represent some possible situations of the application's world or the inability to represent them in a logical, natural, nonredundant, error-avoiding, flexible way. Implementational restrictions are caused by considerations such as hardware, software, database model, effort, time, and expenses. | | | Example 1-54. | | ------- - -- | | If for the application world of the university we use a | | database model less powerful than the Semantic Binary Model, | | then our implementational considerations may require that | | every instructor is uniquely identified by social security | | number. This is not the case in the real world of the | | university, because sometimes an instructor receives a social | | security number only several months after being hired and | | becoming of relevance to the university database. | | | | The aforementioned is a static implementational restriction. | | To cope with it we have either to delay the recording of new | | instructors or to supply them with some temporary numbers. | | | | If our implementation further requires that the social | | security number of a person should remain constant in time, | | then this would constitute a dynamic restriction. Supplying | | temporary numbers would not help in this case. Also, this | | dynamic restriction would not allow for correction of a | | wrongly recorded social security number (due to a data-entry | | clerk's mistake). In practice, such a correction may be | | possible but with an extremely high cost in terms of the | | programming effort and with a chance to inadvertently corrupt | | the database. | | | | | 1.4. Schema Quality Criteria We have defined the term schema for the Semantic Binary Model. The following is a more general definition of the term, regardless of the database model. Schema - a description, in terms of a database model, of the concepts and the information structure of an application's world. It may be the actual data structure of a database. A schema describes all the possible instantaneous databases for one given application's world. | | Example 1-55. | | ------- - -- | | | | A schema for our university application should outline the | basic relevant concepts of the university, such as student, | | ------- instructor, etc., and the kinds of information to be gathered | | ---------- | about them. The schema will not allow the database to contain | | information about salaries of the instructors or about girl- | | or boyfriends of students since these are outside the scope of | | the application's world. | | | | | A schema is called high quality if it satisfies the following criteria: 1. Natural The schema describes the concepts of its application's world naturally: o The schema describes the objects, categories, and relations as they are in the real world. o The users can translate ideas easily in both directions between the concepts of the schema and the natural concepts of the application world. 2. Nonredundant The schema contains very little or no redundancy. Redundancy is the possibility of representing a fact of the application's world more than once in the same instantaneous database (so that if one of the representations is removed from the database, no information is lost; that is, all of the information represented by the instantaneous database remains unaltered). The redundancy should be avoided not in order to improve the storage efficiency -- the storage is not that expensive nowadays. Moreover, the redundancy in the schema is not directly related to the redundancy in the physical storage: a logically nonredundant schema may be physically implemented by a redundant physical structure in order to improve the access-time efficiency. The redundancy should be avoided primarily in order to prevent inconsistency of the database and its update anomalies. When two facts in the database represent the same information, and that information is updated, the user may forget to update both facts. In this case, after the update the two facts would contradict. This contradiction may cause unpredictable behavior of many application programs. The ramifications would be much worse than the local incorrectness of a fact in the database. When the redundancy is needed for the convenience of the users, it should be introduced into the userviews (to be defined in the next section) but not into the schema. | | | Example 1-56. | | ------- - -- | | The following is a fragment of a redundant wrong schema: | | ----- | | [] COURSE-ENROLLMENT - category | | ------ ---------- | | [] final-grade - attribute of COURSE-ENROLLMENT, range: | | 0..100 (m:1) | | - --- - - | | [] student's-address - attribute of COURSE-ENROLLMENT, | | range: String (m:1) | | ------ - - | | | | Suppose a student s, whose address is a, has two enrollments, | | e and e . Then, the following facts (among others) are | 1 2 | logically recorded in the database: | | | | a. s ADDRESS a | | - ------- | | b. e THE-STUDENT s | | - 1 --- ------- | | c. e THE-STUDENT s | | - 2 --- ------- | | d. e STUDENT'S-ADDRESS a | | - 1 ------- - ------- | | e. e STUDENT'S-ADDRESS a | | - 2 ------- - ------- | | The facts (d) and (e) can be inferred from the facts (a) | | through (c). Thus, (d) and (e) can be omitted from the | | database without altering the information represented by | | the database. These facts are redundant; the schema | | should not have allowed their entrance in the first | | place. | | | | | | If the following relation were omitted from the above | | wrong schema, | | | | | [] address - attribute of PERSON, range: String | | ------- ------ ------ (m:1) | | - - | | | | then the schema would still remain redundant and wrong, | | because fact (e) can be inferred from fact (d). | | (Additionally, it would be a problem to record the | | address of a student who has no enrollments yet). | | | | | In some database models we cannot eliminate the redundancy completely. When we have to have some redundancy, we should at least bind it by integrity constraints. When such constraints are implemented, the user is forced to update all the related facts simultaneously. | | Example 1-57. | | ------- - -- | | | | If we cannot avoid having the redundant schema of the previous | | example, we can at least try to enforce the following | | integrity constraint: | | | | | | Whenever | | | | | s ADDRESS a and e THE-STUDENT s | | ------- --- ------- | | | | then | | | | | e STUDENT'S-ADDRESS a | | ------- - ------- | | | 3. Nonrestrictive The schema does not impose implementational restrictions; that is, every situation probable in the real world of the application is fully representable under the schema. | | | Example 1-58. | | ------- - -- | | A schema containing the following relation would prevent very | | senior citizens from entering our university: | | | | | [] birth-year - attribute of PERSON, range: 1900..2000 | | ----- ---- ------ ---- ---- (m:1) | | - - | | | | This schema imposes an avoidable implementational restriction, | | and thus it is wrong. | | | | | 4. Covering integrity constraints The schema covers by itself as many integrity constraints as possible; that is, the class of instantaneous databases formally possible according to the schema is not much larger than the class of all possible situations of the real world. Constraints that are not expressed in the schema cause these problems: o They are hard to formulate and to specify. o They are seldom enforced by the DBMS. Thus, they require a substantial application programming effort for their enforcement, are often implemented incorrectly, and usually prevent direct interaction between the user updating the database and the DBMS (the user may not use the standard language for simple updates, which is supplied by most DBMS). o The users and application programmers often forget or misunderstand such constraints. [] DEPARTMENT - category [] INSTRUCTOR - category [] works-in - relation from INSTRUCTOR to DEPARTMENT (m:m) Figure 1-13. A good schema. | | | Example 1-59. | | ------- - -- | | Figure 1-14 shows a fragment of a poor schema, with respect to | | the coverage of the integrity constraints by the schema. | | | | It requires an integrity constraint not expressed in the | | schema: | | | | For no instructor are there two events of his or her | | work in the same department. | | A better schema fragment is in Figure 1-13. | | | | | [] DEPARTMENT - category [] WORK - category [] INSTRUCTOR - category [] the-department - relation from WORK to DEPARTMENT (m:1) [] the-instructor - relation from WORK to INSTRUCTOR (m:1) Figure 1-14. A poor schema. 5. Flexible The schema is flexible: if probable changes in the concepts of the application world occur, the schema would not have to undergo drastic changes. 6. Conceptually minimal The schema is conceptually minimal: it does not involve concepts which are irrelevant in the application's real world and limits the accumulation of information which is irrelevant in that world. | | Example 1-60. | | ------- - -- | | | | The following would be irrelevant and wrong in the schema of | | our university: | | | | | [] BEAUTIFUL - subcategory of STUDENT | | --------- ------- | | | | Example 1-61. | | ------- - -- | | | | The following would be irrelevant and wrong in the schema of | | our university, unless it is unavoidable due to technical | | problems: | | | | | [] enrollment-number - attribute of ENROLLMENT, range: | | ---------- ------ ---------- Integer (m:1) | | ------- - - | | | The most important issue of the database design is the design of a high-quality schema within the restrictions of the available DBMS and database model. A low-quality schema increases the chances of corruption of the data, makes it very hard to use and maintain the database, and makes it very hard, if not impossible, to adjust the database to the changing concepts of the application's real world. It is easy to design a high-quality schema in the Semantic Binary Model. The task is much harder in most other models. Moreover, it is usually impossible to describe an application world by a schema in the Relational, Network, or Hierarchical model with the same high quality as with which that application can be described in the Semantic Binary Model. The following chapters will introduce methodologies to design conceptually adequate schemas in the Relational, Network, and Hierarchical models. Those schemas will be close to the highest quality possible within the restrictions of the respective models. In those methodologies, a semantic binary schema is designed first, and then the schema is translated into the model supported by the DBMS which will service the application. 1.5. Subschemas and Userviews Subschema -- a part of the schema, provided this part in itself can constitute a schema for some application world. | | Example 1-62. | | ------- - -- | | | | The following figure shows a schema for a very small | | application world. The only relevant information in that world | | is the names of the courses. | | | | | [] COURSE - category | | ------ | | | [] name - attribute of COURSE, range: String (m:1) | | ---- ------ ------ - - | | | | | | | | This schema is a subschema of the University Binary Schema of | Figure . | | --- | | | A subschema of a binary schema can be obtained by removing some of the categories and some of the relations (provided, whenever a category is removed, every relation whose domain or range is the category is also removed). The primary use of the subschemas is to provide subpopulations of the database users with a partial view of the database information. The user of a subschema may regard it as if it were the entire schema and need not be aware of the existence of the information beyond the subschema. The DBMS will conceal from such a user all the information beyond the subschema. This brings the following benefits: o Users do not have to understand the information concepts which are irrelevant to their activities. o Users are prevented from accidentally corrupting the information which they had no business to access in the first place but accessed in error instead of the relevant information. o Malicious users are prevented from accessing information beyond that which they are entitled to access. o When the database is extended by adding new concepts to the schema, and those new concepts do not affect some existing programs using subschemas, those programs need not be modified. | | Example 1-63. | | ------- - -- | | | | The subschema of Figure 1-15 covers the names of the courses | | and the seasons in which the courses are offered. | | | | | [] QUARTER - category [] COURSE - category [] COURSE-OFFERING - category [] season - attribute of QUARTER, range: String (m:1) [] name - attribute of COURSE, range: String (m:1) [] the-course - relation from COURSE-OFFERING to COURSE (m:1) [] the-quarter - relation from COURSE-OFFERING to QUARTER (m:1) Figure 1-15. Subschema for seasons of courses. Normally, the schema is partitioned into nondisjoint subschemas according to the needs of the different divisions within the enterprise and the different subapplications. [diagram skipped] Figure 1-16. The users access the database through subschemas. [diagram skipped] Figure 1-17. The schema is partitioned into subschemas, which need not be disjoint. | | Example 1-64. | | ------- - -- | | | | Figure 1-18 shows the subschema used by the Personnel Office. | | | | | [] DEPARTMENT - category [] INSTRUCTOR - category [] QUARTER - category [] COURSE-OFFERING - category [] works-in - relation from INSTRUCTOR to DEPARTMENT (m:m) [] name - attribute of DEPARTMENT, range: String 1:m (m:1) [] last-name - attribute of INSTRUCTOR, range: String (m:1) [] first-name - attribute of INSTRUCTOR, range: String (m:1) [] birth-year - attribute of INSTRUCTOR, range: 1870..1990 (m:1) [] address - attribute of INSTRUCTOR, range: String (m:1) [] year - attribute of QUARTER, range: 1980..1995 (m:1) [] season - attribute of QUARTER, range: String (m:1) [] the-instructor - relation from COURSE-OFFERING to INSTRUCTOR (m:1) [] the-quarter - relation from COURSE-OFFERING to QUARTER (m:1) Figure 1-18. A subschema for the University Personnel Office. A generalization of the subschema concept, the userview, is defined in the remainder of this section. Inference rules - rules by which new information can be deduced from the information that the users have entered into the database. | | Example 1-65. | | ------- - -- | | | | The following are some parts of the information recorded in | | the university database: | | | | | | o For every instructor, the classes he or she teaches | | | | | | o For every student, the classes he or she takes | | | | | | An inference rule: | | | | | | o If s takes a class taught by p, then p TEACHES s. | | | | | Userview - an alternative view on the application world. A userview is a means of alternative comprehension of a part or all of the application world's information. A userview consists of an alternative schema and inference rules by which every instantaneous database characterized by the original schema implies the (logical) instantaneous database characterized by the alternative schema (Fig. 1-19). [diagram skipped] Figure 1-19. A userview. | | Example 1-66. | | ------- - -- | | | | Figure 1-20 shows the alternative schema of the userview that | | covers the names of the courses and the seasons in which the | | courses are offered. It is more convenient to the user than | the subschema of Figure . | | --- | | | [] COURSE-SEASON - category [] name - attribute of COURSE-SEASON, range: String (m:1) [] season - attribute of COURSE-SEASON, range: String m:m (m:1) Figure 1-20. A userview: seasons of courses. One userview can be used by a subpopulation of the application world's users. Such a userview would conceal from those users all the information which is irrelevant for them. The remaining information is presented to these users in a form which is most convenient to these particular users. | | Example 1-67. | | ------- - -- | | | | The computer science faculty secretary might use a userview | | containing only the addresses of the faculty. The userview | | has the alternative schema shown in Figure 1-21. | | | | | [] COMPUTER-SCIENCE-INSTRUCTOR - category | | -------- ------- ---------- | | | [] last-name - attribute of | | ---- ---- COMPUTER-SCIENCE-INSTRUCTOR, range: String (m:1) | | -------- ------- ---------- ------ - - | | | [] first-name - attribute of | | ----- ---- COMPUTER-SCIENCE-INSTRUCTOR, range: String (m:1) | | -------- ------- ---------- ------ - - | | | [] address - attribute of COMPUTER-SCIENCE-INSTRUCTOR, | | ------- -------- ------- ---------- range: String (m:1) | | ------ - - | | | Figure 1-21. A userview: computer science instructors. | | ------ - -- | | | | | | | The inference rule for the category COMPUTER-SCIENCE- | | -------- ------- INSTRUCTOR is: | | ---------- | | | A computer science instructor is an INSTRUCTOR who | | ---------- WORKS-IN the DEPARTMENT whose NAME is "Computer | | ----- -- ---------- ---- | Science." | | | | | | | Example 1-68. | | ------- - -- | | | | A subpopulation of the users is interested only in knowing who | | taught whom. Their userview has the alternative schema shown | | in Figure 1-22. | | | | | The inference rule for the relation TAUGHT has been given on | | ------ page . The other concepts of the alternative schema are | | --- | copied from the schema. | | | | | [] PERSON - category [] STUDENT - subcategory of PERSON [] INSTRUCTOR - subcategory of PERSON [] taught - relation from INSTRUCTOR to STUDENT (m:m) [] last-name - attribute of PERSON, range: String (m:1) [] first-name - attribute of PERSON, range: String (m:1) [] birth-year - attribute of PERSON, range: 1870..1990 (m:1) [] address - attribute of PERSON, range: String (m:1) Figure 1-22. A userview: the inferred relation taught. Some userviews do not omit any information and can be used by all the users. In this case, a userview presents the same information as the schema does, but in a form most suitable for some particular purposes. All the information which can be deduced from an instantaneous database idb by a userview u is called the instantaneous database under userview u corresponding to idb. [diagram skipped] Figure 1-23. Instantaneous database under a userview. Unlike the schema, the alternative schema of a userview may contain redundant information if it adds to the convenience of the users. This redundancy cannot cause inconsistency, as it would in the case of the schema redundancy, since the updates are translated into the terms of the schema updates, before the updates are actually performed. The usage of userviews also greatly enhances the flexibility of the database. Suppose we have a program that uses a userview, and the concepts of the application world change. o If the change does not affect the logical decisions of this particular program, then we would like to avoid the need to modify the program. o So, we will define a new userview to be used by the program. o The alternative schema of the userview would be identical to the alternative schema of the old userview, so the program would not notice the difference. o The inference rules would be, of course, different. [diagram skipped] Figure 1-24. A change in the schema does not have to affect some users and programs. | | Example 1-69. | | ------- - -- | | | | Suppose we have some programs that use the information of the | | relation | | | | | [] works-in - relation from INSTRUCTOR to DEPARTMENT | | ----- -- ---------- ---------- (m:m) | | - - | | | | Now, the university has become more sophisticated, and for | | certain future programs it will be important to know the | | percentage of time that the instructors work for the | | departments. This new information is irrelevant to most old | | programs. The new schema will contain the fragment shown in | | Figure 1-25. | | | | | [] DEPARTMENT - category | | ---------- | | | [] WORK - category | | ---- | | | [] INSTRUCTOR - category | | ---------- | | | [] percentage-of-time - attribute of WORK, range: | | ---------- -- ---- ---- 0..100% (m:1) | | - --- - - | | | [] the-department - relation from WORK to DEPARTMENT | | --- ---------- ---- ---------- (m:1) | | - - | | | [] the-instructor - relation from WORK to INSTRUCTOR | | --- ---------- ---- ---------- (m:1) | | - - | | | | The alternative schema for the unaffected old programs will | | remain unchanged and will still contain the old relation | WORKS-IN, but the inference rule of their new userview will | | ----- -- | be: | | | | | If there is a WORK whose department is d and whose | | ---- instructor is i, then i WORKS-IN d. | | ----- -- | | | Figure 1-25. A modified schema: a new category WORK. Note: A subschema is a userview whose alternative schema is a part or all of the original schema, and the inference rules are trivial: they copy the information of the alternative schema's concepts. 1.6. Services of DBMS 1.6.1. Database languages: Query, DML, DDL, and others This section introduces the major types of languages supported by database management systems. Retrieval query -- a specification of information which a user wants to extract or to deduce from the database without knowing the full extension of the instantaneous database. | | Example 1-70. | | ------- - -- | | | | The following is a retrieval query specified in English: | | | | | ``What students failed the Databases course last | | --------- | year?'' | | | | | | Most DBMSs do not understand English. Thus, a specification | | in a more formal language is needed. Such a formal language | | need not be a programming language. Most DBMSs support some | | user-oriented formal language, in which the users can specify | | queries without writing programs. | | | | | Query language - a nonprogramming language in which a user can formulate retrieval queries and possibly also update the database. Nonprogramming means that the user does not have to specify an algorithm for the problem but only has to define the problem in a formal way. Some query languages are simple enough to be used by the end user -- a database user who has no computer knowledge or experience. Other, more complex, languages are used by computer professionals. Yet, the professional user can save a significant amount of time by using such a nonprogramming language rather than writing a program. Data manipulation language, DML - a programming language that has a powerful capability of computations, flow of control, input- output, and also has syntactic constructs for database access (the update, retrieval, and dynamic exchange of information between the program and the database). The DML is used by the application programmer. A data manipulation language may be: 1. A stand-alone DML - a special-purpose language. In this case, the DBMS provides a compiler or an interpreter for the DML. The disadvantage of the stand-alone language is that it cannot be used for complex programs which perform some database access but also simultaneously perform other tasks, for example, numeric calculation or industrial assembly line monitoring. 2. A system call interface. In this case, the user writes a program in a regular programming language. The user performs database access by subroutine calls to the DBMS. In a call, the user provides the system with a description of the user's request, parameters of the request, and output variables in which the system will produce the result of the database access. | | | Example 1-71. | | ------- - -- | | The following is a fragment of a Pascal program with system | | calls. | | | | ... | | | | last-name := `Jefferson' | | | | dbmscall ( | | | | `Dear DBMS: Please find the instructor whose last | | name is given in the second argument of this | | call. Place a reference to that instructor | | into the third argument of this call. If | | everything is OK, set the fourth argument of | | this call to 0. If there are several | | instructors by the given name, set the fourth | | argument to 1. If there are no such | | instructors, set it to 2. If another problem | | occurs, set the fourth argument to a number | | greater than 2.' (* In a real program, of | | course, a code would be given instead of this | | "short story." *), | | | | last-name, instructor-reference, return-code); | | | | if (return-code > 0) | | | | then write (` A DBMS error.') | | | | else begin | | | | dbmscall ( | | | | `Dear DBMS: Please relate the instructor, | | referred to by the second argument of | | this call, by the relation BIRTH- | | YEAR to the number given in the third | | argument. If everything is OK, set | | the fourth argument of this call to | | 0. If a problem occurs, set the | | fourth argument to a number greater | | than 0.' (*Of course, in a real | | system call, one would abbreviate | | this "short story" by a code*), | | | | | | instructor-reference, 1960, return-code); | | | | | | if (return-code > 0) | | | | | | then write (` A DBMS error.') | | | | | | end | | | | | The system-call interfaces are usually very unfriendly and hard to program in. The system-calls are interpreted at run-time of the program. One ramification of this is that if a system call contains a syntactically incorrect request, or a request inconsistent with the schema, the user cannot be notified at compile-time but has to wait until the program aborts. Then, the partial effects of the aborted program would have to be undone. Another ramification is the poor efficiency of the run-time interpretation versus compilation. 3. A DML embedded in a host programming language. This is a database access extension of a general-purpose programming language. In a program, the host language statements are interleaved with DML statements. | | Example 1-72. | | ------- - -- | | | | The following is a two-statement fragment of an application | | program written in an embedded DML language whose host | | language is Pascal. | | | | | | ... | | | | | | write (`This is a regular Pascal statement which prints | | this sentence.'); | | | | | relate: i WORKS-IN d (*This is a DML statement*) | | ----- -- | | | The DBMS precompiles the program into a program in the host language without the DML statements. During the precompilation, the DBMS validates the syntax and the compatibility with the schema of the database. The DBMS may also perform optimization of the user's algorithm. The resulting program is compiled by the host language compiler. When the program runs, it may communicate to the DBMS, but the system calls of this communication are transparent to the user. Report generator -- a language in which the user can specify a query together with requirements on the visual form of the output, such as nicely printed tables with titles, or bottom-of-page summaries. Data entry language -- a language in which the end users can specify database updates online. Data definition language, DDL - a language in which the logical structure of information in the application's world can be defined, together with its pragmatic interpretation for the management of a database, including the schema, integrity constraints, and userviews. Many DDLs also allow for modification or redefinition of the database structure. This is needed when the concepts of the application world change and when the database designer finds a better description of the existing real-world concepts (particularly during the initial database design process, which is often trial and error). In many DBMSs, the integrity constraints are specified in languages other than the DDL. In some DBMSs the constraint are incorporated in DML programs. Many DBMSs do not provide any language at all to express the integrity constraints, other than those expressed in the schema. The other integrity constraints are left as the application programmer's responsibility. In this case, an application programmer should write a DML program which will: a. Collect the requests for updates from the users b. Check whether these requests would violate the integrity constraints c. Reject the requests that would violate the integrity d. Submit to the DBMS the remaining good requests 1.6.2. Other services and utilities of DBMS Most database management systems provide some or all of the following services and utilities. 1. Integrity: a. Physical integrity -- prevention of a physical corruption of the database, such as placement of incorrect pointers or loss of an index when the power is shut off or the operating system fails. b. Logical integrity -- enforcement of the integrity constraints. 2. Backup and recovery: a. The DBMS keeps backup copies of the database so that when the database is lost, or logically or physically corrupted, it can be restored to a previous state. b. The DBMS keeps a log of the updates it performs in the database so that when the database is restored to an old state, the correct updates kept in the log can be redone to make the new database up to date. c. When an application program is performing a complex update of the database, and the program gets aborted due to a run- time error or violates the integrity of the database or decides that it did not want that update after all, the DBMS can undo the update, that is, remove the partial effects of the update. 3. Subschemas, userviews, inference rules -- see Section 4. Security: a. Prevention of persons who are not authorized database users from accessing the database. (This often involves verification of passwords or a similar procedure.) b. Prevention of persons who are authorized users of a part of the database information from accessing other parts. This control is normally done through subschemas and userviews. A user may be allowed to read and update some information and to only read some other information. More complicated services include the distribution and revocation of information access privileges. c. Encryption of the particularly sensitive physical files so that they can not be accessed by bypassing the DBMS. 5. Concurrency control: Several users and/or programs may access one database simultaneously. Some of the major problems addressed by the concurrency control are: a. An incorrect view of the database by one user during a database update being performed by another user. b. Inconsistent simultaneous updates being performed by different users. The logical side (the user's perspective) of these problems is addressed in Section 6. Data dictionary: The information in the schema can be regarded as a small additional database. This database can be queried by the user. | | Example 1-73. | | ------- - -- | | | | A schema query: | | | | | What relations have the category STUDENT as their | | ------- | domain? | | | | | This additional database can also accumulate useful information which is not a proper part of the schema. This may include text explaining the informal meaning of every schema concept. | | Example 1-74. | | ------- - -- | | | | A data dictionary explanation entry for the concept `STUDENT': | | | | | STUDENT -- "any person who is or was a registered student of | | ------- | the University at any time during the past 15 years, | | excluding persons who only attended short Extension | | courses of less than 5 days' duration. `Registered' | | means `paid the minimal registration fee at least once.' | | Insertion of a new student is initiated by the Office of | | the Registrar. Removal of a student from the database | | after the expiration of the 15-year period is performed | | automatically by the data manipulation program ARCHIVE, | | which is run every Sunday night. The removal of a | | student can also be explicitly performed by the Office of | | the Registrar as a correction of an error in inserting | | that nonstudent." | | | | | Some database management systems provide a "data dictionary" which can only assist the DML compiler in validating the program syntax. This is not a proper usage of the term because such a "dictionary" facility is no more than a minimal support of the schema, which is essential in any reasonable database management system. The update of the data dictionary is normally a responsibility of the database administrator -- a person who is the technical manager of a database. 7. Restructuring: When the schema changes, the instantaneous database may become inconsistent with the schema. Restructuring is a utility to transfer the old instantaneous database into a new instantaneous database under a new schema. 8. Distributed database: The database of a large enterprise may be physically partitioned into several subsets which are stored in different geographical sites, such as the branches of the enterprise. In order to avoid the costs and delays of telecommunication, each site physically contains the information which is most frequently used in that site. | | Example 1-75. | | ------- - -- | | | | Assume that our University has several campuses. In every | | campus we would physically store information most frequently | | used in that campus: | | | | | a. The departments of that campus | | - | | | b. The instructors who work in those departments | | - | | | c. The students whose majors or minors are among those | | - | departments | | | | | d. The offerings of courses by those instructors | | - | | | e. The enrollments in those offerings | | - | | | Logically, the users at each site see the whole database (or the fractions thereof which they are authorized to see) regardless of the database's partition. When a user's query or update request necessitates the access to information which is beyond the user's site, the system performs this access in a way transparent to the user. | | Example 1-76. | | ------- - -- | | | Assume that there is a student a at campus c who took some | | - 1 courses offered at another campus, c , by instructors of that | | 2 campus c . | | 2 | | | Now, an administrator of Campus c submits a query: | | 1 | | | `Calculate the grade-point-average of Student s.' | | | | | | The administrator need not be aware of the physical | | distribution of the information. During the execution of the | | query, the DBMS will decide that it needs some information | from Campus c , contact that campus's computer, and give the | | 2 | user the correct result as if all the data were available | | locally. | | | | | The physical subsets comprising the distributed database do not have to be disjoint. | | Example 1-77. | | ------- - -- | | | | | | | a. The personal information about a student whose major | | - | is at one campus and whose minor is at another | | campus will be stored in both campuses. The | | enrollments of that student do not have to be | | duplicated, since enrollments are stored according | | to the offerings, which, in turn, are stored | | according to instructors, which are stored by | | departments. | | | | | b. The information about an instructor who works in | | - Department d of Campus c and also in Department c | | 1 1 2 of Campus c will be stored at both campuses. | | 2 | | c. The catalog of the names of all the courses can be | | - | duplicated at each campus, since it is frequently | | accessed at each campus. | | | | | The possible physical duplication of information introduces one of the major problems of the distributed database management: how to maintain consistency between the copies of information. One of the other problems is the optimization of the routing of information between the sites during the execution of queries and update requests. Another problem is tolerance for failures: when one site or one communication line goes down, we do not want to shut down the whole system. 1.7. Multimedia Databases If the DBMS has no implementational restrictions as to the sizes of strings, then the database can store multimedia data as subcategories of the concrete category String. The following are some of such subcategories: Text - arbitrarily long texts (e.g. the entire text of a book). Image - digitized color photograph. Audio - digitized speech or music. Line-drawing - representation of diagrams, signatures, etc. | | Example 1-78. | | ------- - -- | | | | | | | [] syllabus - attribute of COURSE, range: Text (m:1) | | -------- ------ ---- - - | | | [] photo - attribute of PERSON, range: Image (m:1) | | ----- ------ ----- - - | | | [] signature - attribute of INSTRUCTOR, range: Line- | | --------- ---------- ---- drawing (m:1) | | ------- - - | | | [] outgoing-message - attribute of INSTRUCTOR, range: | | -------- ------- ---------- Audio (m:1) (The message to be played by the | | ----- - - | telephone system when the instructor being called is | | not available.) | | | | |
Reference Schemas
The following are the semantic and relational schemas for the university case study application. These schemas are referred to in most of the examples in this text.


Figure Ref-1. A semantic schema for a university application.


Figure Ref-2. A relational schema for the university application.