| Database | From data and base (adj = low, mean, vile, etc). A place where data can be lost in a structured manner. |
|
DBMS (Database Management System) |
The software needed to set up highly complex inter-relational data structures, so that files can be lost in any convenient sequence (e.g. Index before data; First-in-last-out). |
SBM is one of several existing semantic models. The various semantic models are roughly equivalent and have common principles, even though they somewhat differ in terminology and in the tools they use. SBM is simpler than most other semantic data models: it has a small set of sufficient tools by which all of the semantic descriptors of the other models can be constructed. After mastering SBM, a systems analyst may wish to explore more complex semantic models.
This chapter defines and discusses the concepts of a database, a database management system (DBMS), a database schema, modeling real- world information, categorization of real-world objects, relations between objects, graphic representation of database schemas, integrity constraints, quality of database schemas, subschemas, userviews, database languages, services of DBMS, and multimedia databases.
Normally, a database should cover all the information of one application; there should not be two databases for one application.
Example 1-7
Historic information past the archival period; a course offering which
was canceled before it was given.
1.2.1. Categorization of objects
I am an object, if I am of interest to the university. My name is an object. The Information Systems Department and its name "Information Systems Department" are two distinct objects.
Example 1-14
Some of the categories in the world of our university are:
INSTRUCTOR, PERSON, COURSE, STUDENT, DEPARTMENT.
The categories INSTRUCTOR and STUDENT are not disjoint (Figure 1-1); neither are INSTRUCTOR and PERSON.
Example 1-22
BIRTH-YEAR is an m:1 relation because every person has only
one year of birth:
| person1 | BIRTH-YEAR | 1970 |
| person2 | BIRTH-YEAR | 1970 |
| person3 | BIRTH-YEAR | 1969 |
| person4 | BIRTH-YEAR | 1965 |
2. A binary relation R is one-to-many (1:m) if at no point in time xRy and zRy where x=z.
Example 1-24
The relation MAJOR-DEPARTMENT is not 1:m, since a department
may have many major students.
If, instead of the relation MAJOR-DEPARTMENT, we have the relation MAJOR-STUDENT between departments and students, then this relation would be 1:m, since every student can have at most one major department.
3. Relations which are of neither of the above types are called proper many-to-many (m:m).
Example 1-25
WORKS-IN is a proper m:m relation because every instructor can
work in many departments and every department may employ many
instructors, as in Figure 1-4.
4. A binary relation which is both m:1 and 1:m (always) is called one-to-one (1:1).
Example 1-26
If courses are identified by their names, then the relation
COURSE-NAME is 1:1, meaning that every course has at most one
name, and no character string is the name of two different
courses, as in Figure 1-5.
Example 1-27
Suppose that in the current situation in our real world, the
following is true:
5. A binary relation is proper m:1 if it is
m:1 and not 1:1.
6. A binary relation is proper 1:m if it is
1:m and not 1:1.
Example 1-28
Since the COURSE-NAME is 1:1, it is also 1:m, m:1, and
m:m. Since this relation is proper 1:1, it cannot be
proper 1:m, proper m:1, or proper m:m.
Such complex relationships are regarded in the Semantic Binary Model
as groups of several simple relationships.
Example 1-34
All of the types of relations mentioned in the previous
example are proper.
1.2.2.2. Categories as domains and ranges of relations
Domain and range of a binary relation:
Example 1-29
The domain of COURSE-NAME is the category COURSE and its range
is the category STRING. The domain of WORKS-IN is INSTRUCTOR
and the range is DEPARTMENT.
Example 1-30
Although the domain of the relation BIRTH-DATE is the category
PERSON, the date of birth of some relevant persons is
irrelevant or unknown. Thus, the relation BIRTH-DATE is not
total.
1.2.2.3. Attributes
Some binary relations are often called attributes.
Example 1-31
Example 1-32
Last-name, first-name, and birth-year are attributes of
PERSON.
1.2.3. Nonbinary relationships
Example 1-33
There is a relationship between an instructor, a course, and a
quarter in which the instructor offers the course.
The nonbinary relationship of the previous example is
represented in the Semantic Binary Model by a fourth object,
an offering, and three binary relations between the offering
and the instructor, the quarter, and the course, as in Figure
1-7.
In general, the Semantic Binary Model represents any nonbinary
relation as:
a. An abstract category of events. Each event symbolizes the
existence of a relationship between a group of objects.
b. Functional binary relations, whose domain is category (a).
Each of those functional binary relations corresponds to a
role played by some objects in the nonbinary relation.
Thus, the fact that objects x ,..., x participate in an n-ary
1 n
relation R in roles R ,..., R is represented by:
1 n
a. An object e in the category R'
b. Binary relationships eR x ,..., eR x
- 1 1 n n
Example 1-35.
The information about a course offered by an instructor during
a quarter could be considered a ternary relation between
instructors, courses, and quarters. In the Semantic Binary
Model, we solve this problem by representing this information
as a category COURSE-OFFERING and three functional relations
from COURSE-OFFERING: THE-INSTRUCTOR, THE-COURSE, and THE-
QUARTER.
Instructor i has offered course c in quarter q if and only if
there exists a course-offering o, such that:
o THE-INSTRUCTOR i
o THE-COURSE c
o THE-QUARTER q
1.2.4. Instantaneous databases
Formal representation of an instantaneous binary database - as a set
of facts, unary and binary:
Unary fact - a statement that a certain abstract object belongs to a
certain category.
Example 1-36.
(The person whose name is "Jane Howards") is a student.
Binary fact - a statement that there is a certain relationship between
two given objects.
Example 1-37.
The birth-year of (the person whose name is "Jane Howards") is
1968.
Example 1-38.
(The instructor whose name is "John Smith")
works in
(the department whose name is "Information Systems")
Example 1-39.
The final grade of
(the enrollment of (the student whose name is "Jack Brown") in
(the offering of (the course named "Basic Chemistry") by
(the instructor named "Veronica Hammer") during (the
Fall 1900 quarter)))
is 100.
Although this fact relates to the past, it is still relevant
and thus is a part of today's instantaneous database.
Note: In order to be in the current instantaneous database, the fact
must have been explicitly or implicitly entered at some time and
never canceled since.
1.2.5. Semantic binary schemas
A semantic binary schema is a description of the names and the
properties of all the categories and the binary relations
existing in an application's world.
All the instantaneous databases under the schema should have only
those categories and relations listed in the schema. The sets of
pairs of objects corresponding, in the instantaneous database, to
the categories, and the sets of pairs of objects corresponding to
the relations, should satisfy the properties indicated in the
schema.
The schema should list the following properties of the categories
and relations: the subcategories, the domains and ranges of the
relations, and the types of the relations (proper m:m, proper
m:1, proper 1:m, 1:1).
Semantic Binary Model is a convention of specification of the
structure of the real-world information in the form of a semantic
binary schema. Technically, the model can be regarded as the set
of all the possible binary instantaneous databases.
1.2.6. Schema diagrams
This section shows how schemas can be represented graphically in two
dimensions.
1. In a schema diagram, categories are shown by rectangles.
Example 1-40.
[] STUDENT - category
2. Relations from abstract categories to concrete categories are
shown inside the boxes of the domain-categories as follows:
relation : range type
The range is specified as a programming language data-type. (We
will use the style of Pascal here.)
Example 1-41.
[] DEPARTMENT - category
[] name - attribute of DEPARTMENT, range: String
(1:m) (A department may have several names, but
every name is unique.)
Usually, relations between abstract and concrete categories are
m:1. This is the default type of relations whose ranges are
concrete categories, and it need not be explicitly specified in
the schema for such relations.
Example 1-42.
[] PERSON - category
[] last-name - attribute of PERSON, range: String
(m:1)
[] first-name - attribute of PERSON, range: String
(m:1)
[] birth-year - attribute of PERSON, range: 1870..1990
(m:1)
[] address - attribute of PERSON, range: String (m:1)
3. Relations between abstract categories are shown by arrows between
the categories' rectangles. (The direction of the arrow is from
the domain to the range.) The name and type of the relation are
indicated on the arrow. The default for the type of relations
between abstract categories is m:m.
Example 1-43.
Figure 1-8 represents:
[] DEPARTMENT - category
[] INSTRUCTOR - category
[] works-in - relation from INSTRUCTOR to DEPARTMENT
(m:m)
Figure 1-8. A m:m relation.
4. Subcategories' rectangles are connected to their supercategories'
rectangles by arrows with dashes.
Example 1-44.
Figure 1-9 represents:
[] PERSON - category
[] STUDENT - subcategory of PERSON
Figure 1-9. Subcategories.
5. The disjointness of categories is indicated implicitly:
a. Two categories which have a subcategory in common are not
disjoint. (The common subcategory does not have to be their
immediate subcategory, that is, it may be a subcategory of a
subcategory and so on.)
Example 1-45.
No two of the categories in Figure 1-10 are disjoint from each
other.
[] SMART-INSTRUCTOR - category
[] SMART-PERSON - category
[] INSTRUCTOR - category
[] SMART-INSTRUCTOR - subcategory of SMART-PERSON
[] SMART-INSTRUCTOR - subcategory of INSTRUCTOR
Figure 1-10. Intersecting categories.
b. Two categories which are subcategories of one category (not
necessarily immediate subcategories) are considered not
disjoint, unless otherwise declared in an appendix to the
schema.
Example 1-46.
No two of the categories in Figure 1-11 are disjoint from each
other.
[] PERSON - category
[] STUDENT - subcategory of PERSON
[] INSTRUCTOR - subcategory of PERSON
Figure 1-11. Intersecting categories.
c. The other categories are disjoint from each other, unless
otherwise declared in an appendix to the schema.
Example 1-47.
[diagram skipped]
Example 1-48.
A semantic schema for a university application is given at the
end of this book. This will be the principal reference schema
used in the examples throughout the book.
1.2.7. Abstraction of a database storage structure
The physical implementation of a database is a responsibility of the
DBMS. The implementation of the database should be completely
transparent to the database users, including the database designers
and systems analysts.
Nevertheless, it may be helpful to the reader to have a general idea
of how the database may be implemented. This section shows a possible
idea of database implementation. This is a simplistic implementation.
Efficiency is not a concern here. It would be, of course, a concern
in any actual implementation.
We can represent the abstract objects as integers. The DBMS will
enumerate all the abstract objects. The numbers assigned to the
objects will be invisible to the user, as will all the
implementational details.
We can represent the instantaneous database as a large table. Every
row contains a fact: a unary fact (object -- category), or a binary
fact (object -- relation -- object). This table can be implemented as
a file.
Example 1-49.
The following is a fragment of a file representing an
instantaneous database. The fragment consists of one unary
fact and three binary facts.
| object# | relation or category | object# or value |
|-----------+----------------------------+----------------------+
| o21 | COURSE-ENROLLMENT | |
| o21 | THE-STUDENT | o18 |
| o21 | THE-OFFERING | o17 |
| o21 | FINAL-GRADE | 100 |
Example 1-50. |
| ------- - -- |
| |
| The following figure shows an implementation of an |
| instantaneous database for the university application. It |
| specifies the instantaneous database which will be used in |
| examples throughout this book. |
| |
| |
| category | object# | relation | object# or value |
+--------------+------------+----------------+----------------------+
| DEPARTMENT | o1 | NAME | Computer Science |
| | o1 | NAME | CS |
| | o2 | NAME | Mathematics |
| | o2 | NAME | Math |
| |------------+----------------+----------------------+
| | o3 | NAME | Physics |
| | o4 | NAME | Arts |
| |------------+----------------+----------------------+
| | o5 | NAME | Economics |
| COURSE | o6 | NAME | Databases |
| |------------+----------------+----------------------+
| | o7 | NAME | Gastronomy |
| | o8 | NAME | Football |
+--------------+------------+----------------+----------------------+
| QUARTER | o9 | YEAR | 1990 |
| | o9 | SEASON | Fall |
| | o10 | YEAR | 1990 |
| | o10 | SEASON | Winter |
| |------------+----------------+----------------------+
| | o11 | YEAR | 1990 |
| | o11 | SEASON | Spring |
| INSTRUCTOR | o12 | LAST-NAME | Brown |
| | o12 | FIRST-NAME | George |
| | o12 | BIRTH-YEAR | 1956 |
| | o12 | ADDRESS | 112 Lucky Dr. |
| | o12 | WORKS-IN | o1 |
| | o12 | WORKS-IN | o2 |
| |------------+----------------+----------------------+
| | o13 | LAST-NAME | Watson |
| | o13 | FIRST-NAME | Mary |
| | o13 | BIRTH-YEAR | 1953 |
| | o13 | ADDRESS | 231 Fortune Dr. |
| | o13 | WORKS-IN | o3 |
| | o14 | LAST-NAME | Blue |
| | o14 | FIRST-NAME | John |
| | o14 | BIRTH-YEAR | 1950 |
| | o14 | ADDRESS | 536 Orange Dr. |
| | o14 | WORKS-IN | o2 |
Figure 1-12. An instantaneous semantic database for the university
application.
| category | object# | relation | object# or value |
+------------------+----------+-----------------+-------------------+
|COURSE-OFFERING | o15 | THE-INSTRUCTOR | o12 |
| | o15 | THE-COURSE | o6 |
| | o15 | THE-QUARTER | o9 |
| | o16 | THE-INSTRUCTOR | o12 |
| | o16 | THE-COURSE | o7 |
| | o16 | THE-QUARTER | o9 |
| |----------+-----------------+-------------------+
| | o17 | THE-INSTRUCTOR | o12 |
| | o17 | THE-COURSE | o8 |
| | o17 | THE-QUARTER | o9 |
|STUDENT | o18 | LAST-NAME | Victory |
| | o18 | FIRST-NAME | Elizabeth |
| | o18 | BIRTH-YEAR | 1966 |
| | o18 | ADDRESS | 100 Sun St. |
| | o18 | MAJOR | o1 |
| | o18 | MINOR | o5 |
| |----------+-----------------+-------------------+
| | o19 | LAST-NAME | Howards |
| | o19 | FIRST-NAME | Jane |
| | o19 | BIRTH-YEAR | 1965 |
| | o19 | ADDRESS | 200 Dorms |
| | o19 | MAJOR | o4 |
| | o19 | MINOR | o5 |
| | o20 | LAST-NAME | Wood |
| | o20 | FIRST-NAME | Michael |
| | o20 | BIRTH-YEAR | 1964 |
| | o20 | ADDRESS | 110 Dorms |
| | o20 | MAJOR | o4 |
| | o20 | MINOR | o5 |
+------------------+----------+-----------------+-------------------+
|COURSE-ENROLLMENT | o21 | THE-STUDENT | o18 |
| | o21 | THE-OFFERING | o17 |
| | o21 | FINAL-GRADE | 100 |
| | o22 | THE-STUDENT | o19 |
| | o22 | THE-OFFERING | o17 |
| | o22 | FINAL-GRADE | 70 |
| |----------+-----------------+-------------------+
| | o23 | THE-STUDENT | o19 |
| | o23 | THE-OFFERING | o15 |
| | o23 | FINAL-GRADE | 80 |
Figure 1-12. Continued.
1.3. Integrity Constraints
Integrity constraints (synonyms: integrity law, integrity rules) -
rules attached to a database in order to detect obvious user
errors when updating the database.
1. Static integrity constraints - rules to detect instantaneous
databases which cannot correspond to any probable state of the
application's world in the past, present, or future, regardless
of the database's update history.
| |
Example 1-51. |
| ------- - -- |
| |
| These are some static integrity constraints in our university: |
| |
| |
| o No one has two last names |
| |
| |
| o Every student has at most one major department |
| |
| |
| o First names of people are composed only of letters |
| |
| |
| o Students may not participate in a course before they |
| were born or receive a grade in a course before they |
| are 15 years old. |
| |
| |
A static integrity law can be regarded as a Boolean function from
the set of all the instantaneous databases which are well-formed
according to the schema or the database model. This function
assigns the value false to those instantaneous databases which
cannot correspond to any probable state of the application's
world.
2. Dynamic integrity constraints - as above, but the domain of the
function is the set of transitions between instantaneous
databases, and false is assigned to highly improbable transitions
between states of the application's world.
| |
Example 1-52. |
| ------- - -- |
| |
| The following is a dynamic integrity constraint: |
| |
| |
| The catalog of courses is unerasable -- a course, once |
| entered, may not be removed. |
| |
| |
| |
Example 1-53. |
| ------- - -- |
| |
| |
| |
| If we wish to record the sex of persons in our database and we |
| are sure that nobody's sex is ever recorded wrongly (this is |
| usually a dangerous and unreasonable assumption), and we |
| further assume that a woman cannot become a man, then the |
| following would be another dynamic constraint: |
| |
| |
Once the sex of x is female in an instantaneous database, |
| ------
the sex of x remains female in the next instantaneous |
| ------
| database. |
| |
| |
Note:
(i) Very often some of the integrity constraints are captured
by the schema.
(ii) Integrity constraints should be distinguished from
implementational restrictions:
Implementational restrictions - the inability of the database to
represent some possible situations of the application's world or
the inability to represent them in a logical, natural,
nonredundant, error-avoiding, flexible way.
Implementational restrictions are caused by considerations such
as hardware, software, database model, effort, time, and
expenses.
| |
| Example 1-54. |
| ------- - -- |
| If for the application world of the university we use a |
| database model less powerful than the Semantic Binary Model, |
| then our implementational considerations may require that |
| every instructor is uniquely identified by social security |
| number. This is not the case in the real world of the |
| university, because sometimes an instructor receives a social |
| security number only several months after being hired and |
| becoming of relevance to the university database. |
| |
| The aforementioned is a static implementational restriction. |
| To cope with it we have either to delay the recording of new |
| instructors or to supply them with some temporary numbers. |
| |
| If our implementation further requires that the social |
| security number of a person should remain constant in time, |
| then this would constitute a dynamic restriction. Supplying |
| temporary numbers would not help in this case. Also, this |
| dynamic restriction would not allow for correction of a |
| wrongly recorded social security number (due to a data-entry |
| clerk's mistake). In practice, such a correction may be |
| possible but with an extremely high cost in terms of the |
| programming effort and with a chance to inadvertently corrupt |
| the database. |
| |
| |
1.4. Schema Quality Criteria
We have defined the term schema for the Semantic Binary Model. The
following is a more general definition of the term, regardless of the
database model.
Schema - a description, in terms of a database model, of the concepts
and the information structure of an application's world. It may
be the actual data structure of a database. A schema describes
all the possible instantaneous databases for one given
application's world.
| |
Example 1-55. |
| ------- - -- |
| |
| A schema for our university application should outline the |
basic relevant concepts of the university, such as student, |
| -------
instructor, etc., and the kinds of information to be gathered |
| ----------
| about them. The schema will not allow the database to contain |
| information about salaries of the instructors or about girl- |
| or boyfriends of students since these are outside the scope of |
| the application's world. |
| |
| |
A schema is called high quality if it satisfies the following
criteria:
1. Natural
The schema describes the concepts of its application's world
naturally:
o The schema describes the objects, categories, and relations
as they are in the real world.
o The users can translate ideas easily in both directions
between the concepts of the schema and the natural concepts
of the application world.
2. Nonredundant
The schema contains very little or no redundancy. Redundancy is
the possibility of representing a fact of the application's world
more than once in the same instantaneous database (so that if one
of the representations is removed from the database, no
information is lost; that is, all of the information represented
by the instantaneous database remains unaltered).
The redundancy should be avoided not in order to improve the
storage efficiency -- the storage is not that expensive nowadays.
Moreover, the redundancy in the schema is not directly related to
the redundancy in the physical storage: a logically nonredundant
schema may be physically implemented by a redundant physical
structure in order to improve the access-time efficiency.
The redundancy should be avoided primarily in order to prevent
inconsistency of the database and its update anomalies. When two
facts in the database represent the same information, and that
information is updated, the user may forget to update both facts.
In this case, after the update the two facts would contradict.
This contradiction may cause unpredictable behavior of many
application programs. The ramifications would be much worse than
the local incorrectness of a fact in the database.
When the redundancy is needed for the convenience of the users,
it should be introduced into the userviews (to be defined in the
next section) but not into the schema.
| |
| Example 1-56. |
| ------- - -- |
| The following is a fragment of a redundant wrong schema: |
| ----- |
| [] COURSE-ENROLLMENT - category |
| ------ ---------- |
| [] final-grade - attribute of COURSE-ENROLLMENT, range: |
| 0..100 (m:1) |
| - --- - - |
| [] student's-address - attribute of COURSE-ENROLLMENT, |
| range: String (m:1) |
| ------ - - |
| |
| Suppose a student s, whose address is a, has two enrollments, |
| e and e . Then, the following facts (among others) are |
1 2
| logically recorded in the database: |
| |
| a. s ADDRESS a |
| - ------- |
| b. e THE-STUDENT s |
| - 1 --- ------- |
| c. e THE-STUDENT s |
| - 2 --- ------- |
| d. e STUDENT'S-ADDRESS a |
| - 1 ------- - ------- |
| e. e STUDENT'S-ADDRESS a |
| - 2 ------- - ------- |
| The facts (d) and (e) can be inferred from the facts (a) |
| through (c). Thus, (d) and (e) can be omitted from the |
| database without altering the information represented by |
| the database. These facts are redundant; the schema |
| should not have allowed their entrance in the first |
| place. |
| |
| |
| If the following relation were omitted from the above |
| wrong schema, |
| |
| |
[] address - attribute of PERSON, range: String |
| ------- ------ ------
(m:1) |
| - - |
| |
| then the schema would still remain redundant and wrong, |
| because fact (e) can be inferred from fact (d). |
| (Additionally, it would be a problem to record the |
| address of a student who has no enrollments yet). |
| |
| |
In some database models we cannot eliminate the redundancy
completely. When we have to have some redundancy, we should at
least bind it by integrity constraints. When such constraints
are implemented, the user is forced to update all the related
facts simultaneously.
| |
Example 1-57. |
| ------- - -- |
| |
| If we cannot avoid having the redundant schema of the previous |
| example, we can at least try to enforce the following |
| integrity constraint: |
| |
| |
| Whenever |
| |
| |
s ADDRESS a and e THE-STUDENT s |
| ------- --- ------- |
| |
| then |
| |
| |
e STUDENT'S-ADDRESS a |
| ------- - ------- |
| |
3. Nonrestrictive
The schema does not impose implementational restrictions; that
is, every situation probable in the real world of the application
is fully representable under the schema.
| |
| Example 1-58. |
| ------- - -- |
| A schema containing the following relation would prevent very |
| senior citizens from entering our university: |
| |
| |
[] birth-year - attribute of PERSON, range: 1900..2000 |
| ----- ---- ------ ---- ----
(m:1) |
| - - |
| |
| This schema imposes an avoidable implementational restriction, |
| and thus it is wrong. |
| |
| |
4. Covering integrity constraints
The schema covers by itself as many integrity constraints as
possible; that is, the class of instantaneous databases formally
possible according to the schema is not much larger than the
class of all possible situations of the real world.
Constraints that are not expressed in the schema cause these
problems:
o They are hard to formulate and to specify.
o They are seldom enforced by the DBMS. Thus, they require a
substantial application programming effort for their
enforcement, are often implemented incorrectly, and usually
prevent direct interaction between the user updating the
database and the DBMS (the user may not use the standard
language for simple updates, which is supplied by most
DBMS).
o The users and application programmers often forget or
misunderstand such constraints.
[] DEPARTMENT - category
[] INSTRUCTOR - category
[] works-in - relation from INSTRUCTOR to DEPARTMENT (m:m)
Figure 1-13. A good schema.
| |
| Example 1-59. |
| ------- - -- |
| Figure 1-14 shows a fragment of a poor schema, with respect to |
| the coverage of the integrity constraints by the schema. |
| |
| It requires an integrity constraint not expressed in the |
| schema: |
| |
| For no instructor are there two events of his or her |
| work in the same department. |
| A better schema fragment is in Figure 1-13. |
| |
| |
[] DEPARTMENT - category
[] WORK - category
[] INSTRUCTOR - category
[] the-department - relation from WORK to DEPARTMENT (m:1)
[] the-instructor - relation from WORK to INSTRUCTOR (m:1)
Figure 1-14. A poor schema.
5. Flexible
The schema is flexible: if probable changes in the concepts of
the application world occur, the schema would not have to undergo
drastic changes.
6. Conceptually minimal
The schema is conceptually minimal: it does not involve concepts
which are irrelevant in the application's real world and limits
the accumulation of information which is irrelevant in that
world.
| |
Example 1-60. |
| ------- - -- |
| |
| The following would be irrelevant and wrong in the schema of |
| our university: |
| |
| |
[] BEAUTIFUL - subcategory of STUDENT |
| --------- -------
| |
| |
Example 1-61. |
| ------- - -- |
| |
| The following would be irrelevant and wrong in the schema of |
| our university, unless it is unavoidable due to technical |
| problems: |
| |
| |
[] enrollment-number - attribute of ENROLLMENT, range: |
| ---------- ------ ----------
Integer (m:1) |
| ------- - - |
| |
The most important issue of the database design is the design of a
high-quality schema within the restrictions of the available DBMS and
database model. A low-quality schema increases the chances of
corruption of the data, makes it very hard to use and maintain the
database, and makes it very hard, if not impossible, to adjust the
database to the changing concepts of the application's real world.
It is easy to design a high-quality schema in the Semantic Binary
Model. The task is much harder in most other models. Moreover, it is
usually impossible to describe an application world by a schema in the
Relational, Network, or Hierarchical model with the same high quality
as with which that application can be described in the Semantic Binary
Model.
The following chapters will introduce methodologies to design
conceptually adequate schemas in the Relational, Network, and
Hierarchical models. Those schemas will be close to the highest
quality possible within the restrictions of the respective models. In
those methodologies, a semantic binary schema is designed first, and
then the schema is translated into the model supported by the DBMS
which will service the application.
1.5. Subschemas and Userviews
Subschema -- a part of the schema, provided this part in itself can
constitute a schema for some application world.
| |
Example 1-62. |
| ------- - -- |
| |
| The following figure shows a schema for a very small |
| application world. The only relevant information in that world |
| is the names of the courses. |
| |
| |
[] COURSE - category |
| ------ |
| |
[] name - attribute of COURSE, range: String (m:1) |
| ---- ------ ------ - - |
| |
| |
| |
| This schema is a subschema of the University Binary Schema of |
Figure . |
| --- |
| |
A subschema of a binary schema can be obtained by removing some of the
categories and some of the relations (provided, whenever a category is
removed, every relation whose domain or range is the category is also
removed).
The primary use of the subschemas is to provide subpopulations of the
database users with a partial view of the database information. The
user of a subschema may regard it as if it were the entire schema and
need not be aware of the existence of the information beyond the
subschema. The DBMS will conceal from such a user all the information
beyond the subschema. This brings the following benefits:
o Users do not have to understand the information concepts which
are irrelevant to their activities.
o Users are prevented from accidentally corrupting the information
which they had no business to access in the first place but
accessed in error instead of the relevant information.
o Malicious users are prevented from accessing information beyond
that which they are entitled to access.
o When the database is extended by adding new concepts to the
schema, and those new concepts do not affect some existing
programs using subschemas, those programs need not be modified.
| |
Example 1-63. |
| ------- - -- |
| |
| The subschema of Figure 1-15 covers the names of the courses |
| and the seasons in which the courses are offered. |
| |
| |
[] QUARTER - category
[] COURSE - category
[] COURSE-OFFERING - category
[] season - attribute of QUARTER, range: String (m:1)
[] name - attribute of COURSE, range: String (m:1)
[] the-course - relation from COURSE-OFFERING to COURSE (m:1)
[] the-quarter - relation from COURSE-OFFERING to QUARTER
(m:1)
Figure 1-15. Subschema for seasons of courses.
Normally, the schema is partitioned into nondisjoint subschemas
according to the needs of the different divisions within the
enterprise and the different subapplications.
[diagram skipped]
Figure 1-16. The users access the database through subschemas.
[diagram skipped]
Figure 1-17. The schema is partitioned into subschemas, which need
not be disjoint.
| |
Example 1-64. |
| ------- - -- |
| |
| Figure 1-18 shows the subschema used by the Personnel Office. |
| |
| |
[] DEPARTMENT - category
[] INSTRUCTOR - category
[] QUARTER - category
[] COURSE-OFFERING - category
[] works-in - relation from INSTRUCTOR to DEPARTMENT (m:m)
[] name - attribute of DEPARTMENT, range: String 1:m (m:1)
[] last-name - attribute of INSTRUCTOR, range: String (m:1)
[] first-name - attribute of INSTRUCTOR, range: String (m:1)
[] birth-year - attribute of INSTRUCTOR, range: 1870..1990
(m:1)
[] address - attribute of INSTRUCTOR, range: String (m:1)
[] year - attribute of QUARTER, range: 1980..1995 (m:1)
[] season - attribute of QUARTER, range: String (m:1)
[] the-instructor - relation from COURSE-OFFERING to
INSTRUCTOR (m:1)
[] the-quarter - relation from COURSE-OFFERING to QUARTER
(m:1)
Figure 1-18. A subschema for the University Personnel Office.
A generalization of the subschema concept, the userview, is defined in
the remainder of this section.
Inference rules - rules by which new information can be deduced from
the information that the users have entered into the database.
| |
Example 1-65. |
| ------- - -- |
| |
| The following are some parts of the information recorded in |
| the university database: |
| |
| |
| o For every instructor, the classes he or she teaches |
| |
| |
| o For every student, the classes he or she takes |
| |
| |
| An inference rule: |
| |
| |
| o If s takes a class taught by p, then p TEACHES s. |
| |
| |
Userview - an alternative view on the application world.
A userview is a means of alternative comprehension of a part or all of
the application world's information. A userview consists of an
alternative schema and inference rules by which every instantaneous
database characterized by the original schema implies the (logical)
instantaneous database characterized by the alternative schema (Fig.
1-19).
[diagram skipped]
Figure 1-19. A userview.
| |
Example 1-66. |
| ------- - -- |
| |
| Figure 1-20 shows the alternative schema of the userview that |
| covers the names of the courses and the seasons in which the |
| courses are offered. It is more convenient to the user than |
the subschema of Figure . |
| --- |
| |
[] COURSE-SEASON - category
[] name - attribute of COURSE-SEASON, range: String (m:1)
[] season - attribute of COURSE-SEASON, range: String m:m
(m:1)
Figure 1-20. A userview: seasons of courses.
One userview can be used by a subpopulation of the application world's
users. Such a userview would conceal from those users all the
information which is irrelevant for them. The remaining information
is presented to these users in a form which is most convenient to
these particular users.
| |
Example 1-67. |
| ------- - -- |
| |
| The computer science faculty secretary might use a userview |
| containing only the addresses of the faculty. The userview |
| has the alternative schema shown in Figure 1-21. |
| |
| |
[] COMPUTER-SCIENCE-INSTRUCTOR - category |
| -------- ------- ---------- |
| |
[] last-name - attribute of |
| ---- ----
COMPUTER-SCIENCE-INSTRUCTOR, range: String (m:1) |
| -------- ------- ---------- ------ - - |
| |
[] first-name - attribute of |
| ----- ----
COMPUTER-SCIENCE-INSTRUCTOR, range: String (m:1) |
| -------- ------- ---------- ------ - - |
| |
[] address - attribute of COMPUTER-SCIENCE-INSTRUCTOR, |
| ------- -------- ------- ----------
range: String (m:1) |
| ------ - - |
| |
Figure 1-21. A userview: computer science instructors. |
| ------ - -- |
| |
| |
| |
The inference rule for the category COMPUTER-SCIENCE- |
| -------- -------
INSTRUCTOR is: |
| ---------- |
| |
A computer science instructor is an INSTRUCTOR who |
| ----------
WORKS-IN the DEPARTMENT whose NAME is "Computer |
| ----- -- ---------- ----
| Science." |
| |
| |
| |
Example 1-68. |
| ------- - -- |
| |
| A subpopulation of the users is interested only in knowing who |
| taught whom. Their userview has the alternative schema shown |
| in Figure 1-22. |
| |
| |
The inference rule for the relation TAUGHT has been given on |
| ------
page . The other concepts of the alternative schema are |
| ---
| copied from the schema. |
| |
| |
[] PERSON - category
[] STUDENT - subcategory of PERSON
[] INSTRUCTOR - subcategory of PERSON
[] taught - relation from INSTRUCTOR to STUDENT (m:m)
[] last-name - attribute of PERSON, range: String (m:1)
[] first-name - attribute of PERSON, range: String (m:1)
[] birth-year - attribute of PERSON, range: 1870..1990 (m:1)
[] address - attribute of PERSON, range: String (m:1)
Figure 1-22. A userview: the inferred relation taught.
Some userviews do not omit any information and can be used by all the
users. In this case, a userview presents the same information as the
schema does, but in a form most suitable for some particular purposes.
All the information which can be deduced from an instantaneous
database idb by a userview u is called the instantaneous database
under userview u corresponding to idb.
[diagram skipped]
Figure 1-23. Instantaneous database under a userview.
Unlike the schema, the alternative schema of a userview may contain
redundant information if it adds to the convenience of the users.
This redundancy cannot cause inconsistency, as it would in the case of
the schema redundancy, since the updates are translated into the terms
of the schema updates, before the updates are actually performed.
The usage of userviews also greatly enhances the flexibility of the
database. Suppose we have a program that uses a userview, and the
concepts of the application world change.
o If the change does not affect the logical decisions of this
particular program, then we would like to avoid the need to
modify the program.
o So, we will define a new userview to be used by the program.
o The alternative schema of the userview would be identical to
the alternative schema of the old userview, so the program
would not notice the difference.
o The inference rules would be, of course, different.
[diagram skipped]
Figure 1-24. A change in the schema does not have to affect some
users and programs.
| |
Example 1-69. |
| ------- - -- |
| |
| Suppose we have some programs that use the information of the |
| relation |
| |
| |
[] works-in - relation from INSTRUCTOR to DEPARTMENT |
| ----- -- ---------- ----------
(m:m) |
| - - |
| |
| Now, the university has become more sophisticated, and for |
| certain future programs it will be important to know the |
| percentage of time that the instructors work for the |
| departments. This new information is irrelevant to most old |
| programs. The new schema will contain the fragment shown in |
| Figure 1-25. |
| |
| |
[] DEPARTMENT - category |
| ---------- |
| |
[] WORK - category |
| ---- |
| |
[] INSTRUCTOR - category |
| ---------- |
| |
[] percentage-of-time - attribute of WORK, range: |
| ---------- -- ---- ----
0..100% (m:1) |
| - --- - - |
| |
[] the-department - relation from WORK to DEPARTMENT |
| --- ---------- ---- ----------
(m:1) |
| - - |
| |
[] the-instructor - relation from WORK to INSTRUCTOR |
| --- ---------- ---- ----------
(m:1) |
| - - |
| |
| The alternative schema for the unaffected old programs will |
| remain unchanged and will still contain the old relation |
WORKS-IN, but the inference rule of their new userview will |
| ----- --
| be: |
| |
| |
If there is a WORK whose department is d and whose |
| ----
instructor is i, then i WORKS-IN d. |
| ----- -- |
| |
Figure 1-25. A modified schema: a new category WORK.
Note: A subschema is a userview whose alternative schema is a part or
all of the original schema, and the inference rules are trivial:
they copy the information of the alternative schema's concepts.
1.6. Services of DBMS
1.6.1. Database languages: Query, DML, DDL, and others
This section introduces the major types of languages supported by
database management systems.
Retrieval query -- a specification of information which a user wants
to extract or to deduce from the database without knowing the
full extension of the instantaneous database.
| |
Example 1-70. |
| ------- - -- |
| |
| The following is a retrieval query specified in English: |
| |
| |
``What students failed the Databases course last |
| ---------
| year?'' |
| |
| |
| Most DBMSs do not understand English. Thus, a specification |
| in a more formal language is needed. Such a formal language |
| need not be a programming language. Most DBMSs support some |
| user-oriented formal language, in which the users can specify |
| queries without writing programs. |
| |
| |
Query language - a nonprogramming language in which a user can
formulate retrieval queries and possibly also update the
database. Nonprogramming means that the user does not have to
specify an algorithm for the problem but only has to define the
problem in a formal way.
Some query languages are simple enough to be used by the end user
-- a database user who has no computer knowledge or experience.
Other, more complex, languages are used by computer
professionals. Yet, the professional user can save a significant
amount of time by using such a nonprogramming language rather
than writing a program.
Data manipulation language, DML - a programming language that has a
powerful capability of computations, flow of control, input-
output, and also has syntactic constructs for database access
(the update, retrieval, and dynamic exchange of information
between the program and the database). The DML is used by the
application programmer.
A data manipulation language may be:
1. A stand-alone DML - a special-purpose language. In this case,
the DBMS provides a compiler or an interpreter for the DML. The
disadvantage of the stand-alone language is that it cannot be
used for complex programs which perform some database access but
also simultaneously perform other tasks, for example, numeric
calculation or industrial assembly line monitoring.
2. A system call interface. In this case, the user writes a program
in a regular programming language. The user performs database
access by subroutine calls to the DBMS. In a call, the user
provides the system with a description of the user's request,
parameters of the request, and output variables in which the
system will produce the result of the database access.
| |
| Example 1-71. |
| ------- - -- |
| The following is a fragment of a Pascal program with system |
| calls. |
| |
| ... |
| |
| last-name := `Jefferson' |
| |
| dbmscall ( |
| |
| `Dear DBMS: Please find the instructor whose last |
| name is given in the second argument of this |
| call. Place a reference to that instructor |
| into the third argument of this call. If |
| everything is OK, set the fourth argument of |
| this call to 0. If there are several |
| instructors by the given name, set the fourth |
| argument to 1. If there are no such |
| instructors, set it to 2. If another problem |
| occurs, set the fourth argument to a number |
| greater than 2.' (* In a real program, of |
| course, a code would be given instead of this |
| "short story." *), |
| |
| last-name, instructor-reference, return-code); |
| |
| if (return-code > 0) |
| |
| then write (` A DBMS error.') |
| |
| else begin |
| |
| dbmscall ( |
| |
| `Dear DBMS: Please relate the instructor, |
| referred to by the second argument of |
| this call, by the relation BIRTH- |
| YEAR to the number given in the third |
| argument. If everything is OK, set |
| the fourth argument of this call to |
| 0. If a problem occurs, set the |
| fourth argument to a number greater |
| than 0.' (*Of course, in a real |
| system call, one would abbreviate |
| this "short story" by a code*), |
| |
| |
| instructor-reference, 1960, return-code); |
| |
| |
| if (return-code > 0) |
| |
| |
| then write (` A DBMS error.') |
| |
| |
| end |
| |
| |
The system-call interfaces are usually very unfriendly and hard
to program in.
The system-calls are interpreted at run-time of the program. One
ramification of this is that if a system call contains a
syntactically incorrect request, or a request inconsistent with
the schema, the user cannot be notified at compile-time but has
to wait until the program aborts. Then, the partial effects of
the aborted program would have to be undone.
Another ramification is the poor efficiency of the run-time
interpretation versus compilation.
3. A DML embedded in a host programming language. This is a
database access extension of a general-purpose programming
language. In a program, the host language statements are
interleaved with DML statements.
| |
Example 1-72. |
| ------- - -- |
| |
| The following is a two-statement fragment of an application |
| program written in an embedded DML language whose host |
| language is Pascal. |
| |
| |
| ... |
| |
| |
| write (`This is a regular Pascal statement which prints |
| this sentence.'); |
| |
| |
relate: i WORKS-IN d (*This is a DML statement*) |
| ----- -- |
| |
The DBMS precompiles the program into a program in the host
language without the DML statements. During the precompilation,
the DBMS validates the syntax and the compatibility with the
schema of the database. The DBMS may also perform optimization
of the user's algorithm.
The resulting program is compiled by the host language compiler.
When the program runs, it may communicate to the DBMS, but the
system calls of this communication are transparent to the user.
Report generator -- a language in which the user can specify a query
together with requirements on the visual form of the output, such
as nicely printed tables with titles, or bottom-of-page
summaries.
Data entry language -- a language in which the end users can specify
database updates online.
Data definition language, DDL - a language in which the logical
structure of information in the application's world can be
defined, together with its pragmatic interpretation for the
management of a database, including the schema, integrity
constraints, and userviews.
Many DDLs also allow for modification or redefinition of the
database structure. This is needed when the concepts of the
application world change and when the database designer finds a
better description of the existing real-world concepts
(particularly during the initial database design process, which
is often trial and error).
In many DBMSs, the integrity constraints are specified in
languages other than the DDL. In some DBMSs the constraint are
incorporated in DML programs.
Many DBMSs do not provide any language at all to express the
integrity constraints, other than those expressed in the schema.
The other integrity constraints are left as the application
programmer's responsibility. In this case, an application
programmer should write a DML program which will:
a. Collect the requests for updates from the users
b. Check whether these requests would violate the integrity
constraints
c. Reject the requests that would violate the integrity
d. Submit to the DBMS the remaining good requests
1.6.2. Other services and utilities of DBMS
Most database management systems provide some or all of the following
services and utilities.
1. Integrity:
a. Physical integrity -- prevention of a physical corruption of
the database, such as placement of incorrect pointers or
loss of an index when the power is shut off or the operating
system fails.
b. Logical integrity -- enforcement of the integrity
constraints.
2. Backup and recovery:
a. The DBMS keeps backup copies of the database so that when
the database is lost, or logically or physically corrupted,
it can be restored to a previous state.
b. The DBMS keeps a log of the updates it performs in the
database so that when the database is restored to an old
state, the correct updates kept in the log can be redone to
make the new database up to date.
c. When an application program is performing a complex update
of the database, and the program gets aborted due to a run-
time error or violates the integrity of the database or
decides that it did not want that update after all, the DBMS
can undo the update, that is, remove the partial effects of
the update.
3. Subschemas, userviews, inference rules -- see Section
4. Security:
a. Prevention of persons who are not authorized database users
from accessing the database. (This often involves
verification of passwords or a similar procedure.)
b. Prevention of persons who are authorized users of a part of
the database information from accessing other parts. This
control is normally done through subschemas and userviews.
A user may be allowed to read and update some information
and to only read some other information. More complicated
services include the distribution and revocation of
information access privileges.
c. Encryption of the particularly sensitive physical files so
that they can not be accessed by bypassing the DBMS.
5. Concurrency control:
Several users and/or programs may access one database
simultaneously. Some of the major problems addressed by the
concurrency control are:
a. An incorrect view of the database by one user during a
database update being performed by another user.
b. Inconsistent simultaneous updates being performed by
different users.
The logical side (the user's perspective) of these problems is
addressed in Section
6. Data dictionary:
The information in the schema can be regarded as a small
additional database. This database can be queried by the user.
| |
Example 1-73. |
| ------- - -- |
| |
| A schema query: |
| |
| |
What relations have the category STUDENT as their |
| -------
| domain? |
| |
| |
This additional database can also accumulate useful information
which is not a proper part of the schema. This may include text
explaining the informal meaning of every schema concept.
| |
Example 1-74. |
| ------- - -- |
| |
| A data dictionary explanation entry for the concept `STUDENT': |
| |
| |
STUDENT -- "any person who is or was a registered student of |
| -------
| the University at any time during the past 15 years, |
| excluding persons who only attended short Extension |
| courses of less than 5 days' duration. `Registered' |
| means `paid the minimal registration fee at least once.' |
| Insertion of a new student is initiated by the Office of |
| the Registrar. Removal of a student from the database |
| after the expiration of the 15-year period is performed |
| automatically by the data manipulation program ARCHIVE, |
| which is run every Sunday night. The removal of a |
| student can also be explicitly performed by the Office of |
| the Registrar as a correction of an error in inserting |
| that nonstudent." |
| |
| |
Some database management systems provide a "data dictionary"
which can only assist the DML compiler in validating the program
syntax. This is not a proper usage of the term because such a
"dictionary" facility is no more than a minimal support of the
schema, which is essential in any reasonable database management
system.
The update of the data dictionary is normally a responsibility of
the database administrator -- a person who is the technical
manager of a database.
7. Restructuring:
When the schema changes, the instantaneous database may become
inconsistent with the schema. Restructuring is a utility to
transfer the old instantaneous database into a new instantaneous
database under a new schema.
8. Distributed database:
The database of a large enterprise may be physically partitioned
into several subsets which are stored in different geographical
sites, such as the branches of the enterprise. In order to avoid
the costs and delays of telecommunication, each site physically
contains the information which is most frequently used in that
site.
| |
Example 1-75. |
| ------- - -- |
| |
| Assume that our University has several campuses. In every |
| campus we would physically store information most frequently |
| used in that campus: |
| |
| |
a. The departments of that campus |
| - |
| |
b. The instructors who work in those departments |
| - |
| |
c. The students whose majors or minors are among those |
| -
| departments |
| |
| |
d. The offerings of courses by those instructors |
| - |
| |
e. The enrollments in those offerings |
| - |
| |
Logically, the users at each site see the whole database (or the
fractions thereof which they are authorized to see) regardless of
the database's partition. When a user's query or update request
necessitates the access to information which is beyond the user's
site, the system performs this access in a way transparent to the
user.
| |
Example 1-76. |
| ------- - -- |
| |
Assume that there is a student a at campus c who took some |
| - 1
courses offered at another campus, c , by instructors of that |
| 2
campus c . |
| 2 |
| |
Now, an administrator of Campus c submits a query: |
| 1
| |
| `Calculate the grade-point-average of Student s.' |
| |
| |
| The administrator need not be aware of the physical |
| distribution of the information. During the execution of the |
| query, the DBMS will decide that it needs some information |
from Campus c , contact that campus's computer, and give the |
| 2
| user the correct result as if all the data were available |
| locally. |
| |
| |
The physical subsets comprising the distributed database do not
have to be disjoint.
| |
Example 1-77. |
| ------- - -- |
| |
| |
| |
a. The personal information about a student whose major |
| -
| is at one campus and whose minor is at another |
| campus will be stored in both campuses. The |
| enrollments of that student do not have to be |
| duplicated, since enrollments are stored according |
| to the offerings, which, in turn, are stored |
| according to instructors, which are stored by |
| departments. |
| |
| |
b. The information about an instructor who works in |
| -
Department d of Campus c and also in Department c |
| 1 1 2
of Campus c will be stored at both campuses. |
| 2
| |
c. The catalog of the names of all the courses can be |
| -
| duplicated at each campus, since it is frequently |
| accessed at each campus. |
| |
| |
The possible physical duplication of information introduces one
of the major problems of the distributed database management:
how to maintain consistency between the copies of information.
One of the other problems is the optimization of the routing of
information between the sites during the execution of queries and
update requests. Another problem is tolerance for failures: when
one site or one communication line goes down, we do not want to
shut down the whole system.
1.7. Multimedia Databases
If the DBMS has no implementational restrictions as to the sizes of
strings, then the database can store multimedia data as subcategories
of the concrete category String. The following are some of such
subcategories:
Text - arbitrarily long texts (e.g. the entire text of a book).
Image - digitized color photograph.
Audio - digitized speech or music.
Line-drawing - representation of diagrams, signatures, etc.
| |
Example 1-78. |
| ------- - -- |
| |
| |
| |
[] syllabus - attribute of COURSE, range: Text (m:1) |
| -------- ------ ---- - - |
| |
[] photo - attribute of PERSON, range: Image (m:1) |
| ----- ------ ----- - - |
| |
[] signature - attribute of INSTRUCTOR, range: Line- |
| --------- ---------- ----
drawing (m:1) |
| ------- - - |
| |
[] outgoing-message - attribute of INSTRUCTOR, range: |
| -------- ------- ----------
Audio (m:1) (The message to be played by the |
| ----- - -
| telephone system when the instructor being called is |
| not available.) |
| |
| |


Figure Ref-2. A relational schema for the university
application.