Chapter 11
*Fifth-Generation Languages
This chapter discusses several fifth-generation languages. Sections
1-3 address issues of expressive power of logic-based database
languages and discuss Prolog-like languages and a logic-based language
which attains computational completeness. Section 4 discusses user-
friendly interfaces, using the Query-By-Example language as an
example.
- The optional sections of Chapter 3 are prerequisite to reading this
chapter.
Not every query can be specified in the languages based on Predicate
Calculus.
|
Example 11-1.
A person x can improve grades of a person y if
x teaches y or x teaches a person who can
improve grades of y. A query to find whether x can improve
grades of y cannot be specified in the calculus.
|
|
Example 11-2.
Consider the following relational subschema of the bill of
materiel of items. Each row in the table contains the names
of two items where one item has the other as its immediate
component.
|
COMPONENT
contained-item: String
containing-item: String
|
We cannot specify in Predicate Calculus or in SQL or in the
Relational Algebra the following query:
- `Print a list of all the pairs of items which
directly or indirectly contain one another.'
|
The queries in the examples above are recursive queries. Some
unexpressible recursive queries would become expressible if we extend
the calculus with additional constructs, such as the so-called transitive
closure operator or the more powerful fixed-point operator.
But no matter what additional constructs we add to the Predicate
Calculus, it will still be possible to encounter a query which cannot
be specified in the extended language. A solution to this problem
will be suggested in a later section.
In the section on the calculus for transactions we have used an insert
operation with the following syntax:
- insert into category (relation1 :
expression1 , ... , relationk :
expressionk) where condition
Let us consider now a program which iteratively performs a set of
insert statements and terminates when there is nothing more to
insert.
|
Example 11-3.
Let ITEM-NAME be a concrete category of strings.
The following program augments the table COMPONENT by all the
pairs of items which directly or indirectly contain one
another.
- repeat
- insert into COMPONENT
- (CONTAINING-ITEM: containing,
- CONTAINED-ITEM: contained)
- where
- exists intermediate in ITEM-NAME:
- COMPONENT (CONTAINING-ITEM: containing,
CONTAINED-ITEM: intermediate) and
- COMPONENT (CONTAINING-ITEM: intermediate,
CONTAINED-ITEM: contained)
until nothing new has been inserted in the last
iteration
|
A Prolog-like database program is an equivalent of a program with one
repeat loop enclosing several insert statements:
- repeat
- insert-statements
- until nothing new has been inserted in the last iteration
(We interpret the insert operation as adding only information which is
not already there.)
Normally, Prolog-like languages allow only very primitive conditions
within the insert statements: they do not allow the quantifier "for
every" or complex expressions within those conditions. In exchange
for this limitation, the Prolog-like languages can perform the program
by a much faster algorithm than the obvious straightforward
implementation of the loop.
A Prolog-like program can be used for retrieval of information from
the database if instead of inserting new objects into the existing
categories in the database we perform insertion into output tables.
|
Example 11-4.
The following program will produce a table OUTPUT-COMPONENT
which will be composed of all the pairs of items which
directly or indirectly contain one another. The input is the
original table COMPONENT, which is composed of the pairs of
items directly containing one another.
- repeat
- insert into OUTPUT-COMPONENT
- (CONTAINING-ITEM: containing,
- CONTAINED-ITEM: contained)
- where
- COMPONENT (CONTAINING-ITEM: containing,
CONTAINED-ITEM: contained);
- insert into OUTPUT-COMPONENT
- (CONTAINING-ITEM: containing,
CONTAINED-ITEM: contained)
- where
- exists intermediate in ITEM-NAME:
- COMPONENT (CONTAINING-ITEM:
containing, CONTAINED-ITEM:
intermediate) and
- OUTPUT-COMPONENT (CONTAINING-ITEM:
intermediate, CONTAINED-ITEM:
contained)
until nothing new has been inserted in the last iteration
- (* The first insert statement will be performed only once. *)
|
The above description of Prolog-like languages has a strong procedural
flavor because of the ``repeat . . . until'' loop. We can describe a
Prolog-like program nonprocedurally - as an assertion which links the
relations of the input database and the relations to be produced as
the output of the query.
|
Example 11-5.
The above query can be nonprocedurally regarded as the
following assertion:
- for every containing in ITEM-NAME:
- for every contained in ITEM-NAME:
- if COMPONENT
- (CONTAINING-ITEM: containing,
CONTAINED-ITEM: contained)
- then OUTPUT-COMPONENT
- (CONTAINING-ITEM: containing,
CONTAINED-ITEM: contained)
and
for every containing in ITEM-NAME:
- for every contained in ITEM-NAME:
- for every intermediate in ITEM-NAME:
- if
- COMPONENT (CONTAINING-ITEM: containing,
CONTAINED-ITEM: intermediate) and
- OUTPUT-COMPONENT (CONTAINING-ITEM:
intermediate, CONTAINED-ITEM: contained)
- then OUTPUT-COMPONENT
- (CONTAINING-ITEM: containing,
CONTAINED-ITEM: contained)
The pragmatic meaning of such an assertion is:
- Output a set of tuples which, when regarded as a
- table OUTPUT-COMPONENT, would make the above
- assertion come true. Do not output any extra tuples
- which are not needed to make the assertion true.
|
The Prolog-like languages described above are sometimes referred to as
fifth-generation database languages, although this term can be broadly
applied to all powerful nonprocedural database languages, particularly
the languages based on Predicate Calculus.
The actual syntax of Prolog-like languages is usually somewhat
different from the ``insert'' notation shown above, but it is
equivalent to that notation. The loop specification is omitted. The
quantifiers ``exists'' in the conditions are implicit. (No ambiguity
arises since the ``for every'' quantifier is normally not allowed as a
part of the insert conditions in Prolog-like languages.)
|
Example 11-6.
The above program would be written as follows in some Prolog-like
languages:
- OUTPUT-COMPONENT(CONTAINING-ITEM: containing,
CONTAINED-ITEM: contained) <-
- COMPONENT (CONTAINING-ITEM: containing,
CONTAINED-ITEM: contained);
- OUTPUT-COMPONENT (CONTAINING-ITEM: containing,
CONTAINED-ITEM: contained) <-
- COMPONENT (CONTAINING-ITEM: containing,
CONTAINED-ITEM: intermediate),
- OUTPUT-COMPONENT (CONTAINING-ITEM:
intermediate, CONTAINED-ITEM: contained)
|
The expressive power of Prolog-like database languages, that is, the
ability of these languages to express a wide range of queries,
depends, in part, on what is allowed to appear on the right side of
the <- statement. However, even if we allow arbitrary first-order
Predicate Calculus assertions on the right side of the statements,
there would still be many reasonable queries which cannot be specified
in the Prolog-like database languages without sacrificing the
nonprocedurality of the language. The cause of such limitation is
the limit to what one can do with the "insert until nothing new comes"
construct.
A nonprocedural
language more powerful than the Prolog-like database languages has
been proposed by N. Rishe.
In this language, a query is specified simply as an assertion about
the relations to appear in the output, linking those relations to the
information in the input instantaneous database.
|
Example 11-7.
The previously considered example on page ____, which was an
assertion equivalent to a Prolog program producing the table
of the pairs of components indirectly containing one another,
is also an example of a query specification in the language
discussed in this section.
|
|
Example 11-8.
The following is an assertion stating that the output shall
display the item which indirectly contains all the items (if
such a superitem exists).
We assume that the userview has
- contains - relation from ITEM-NAME to ITEM-NAME
(m:m)
The output is the category SUPERITEM-OUTPUT. This category
will have at most one object - the superitem. In the formulation of
the query we define a temporary relation INDIRECTLY-CONTAINS.
The assertion states that this temporary relation relates all the pairs
of items indirectly containing one another. (This includes an item
containing itself.) This temporary relation will not become a part of
the output.
- for every item in ITEM-NAME:
- item INDIRECTLY-CONTAINS item
- and
- for every containing in ITEM-NAME:
- for every contained in ITEM-NAME:
- for every intermediate in ITEM-NAME:
|
if
|
containing CONTAINS intermediate and intermediate
INDIRECTLY-CONTAINS contained
|
|
then
|
containing INDIRECTLY-CONTAINS contained
|
- and
- for every si in ITEM-NAME:
- if
- (for every item in ITEM-NAME:
- si INDIRECTLY-CONTAINS item)
- then si is a SUPERITEM-OUTPUT
- The pragmatic meaning of such an assertion is:
- Create an instantaneous relation INDIRECTLY-CONTAINS and
an instantaneous category SUPERITEM-OUTPUT, which would make
the above assertion come true. Output the SUPERITEM-OUTPUT
category and discard the temporary relation INDIRECTLY-CONTAINS.
Do not output any extra objects or relationships which are not needed
to make the assertion true.
|
|
Example 11-9.
The following is an assertion specifying the relation
CAN-IMPROVE-THE-GRADES-OF.
- for every s in STUDENT:
- for every i in INSTRUCTOR:
- (if i TAUGHT s
- then i CAN-IMPROVE-THE-GRADES-OF s) and
- for every middleman in STUDENT:
|
if
|
i TAUGHT middleman and
middleman CAN-IMPROVE-THE-GRADES-OF s
|
|
then
|
i CAN-IMPROVE-THE-GRADES-OF s
|
|
In a sense, this language is a superset of the Prolog-like database
languages. A Prolog-like database program can also be regarded as an
assertion about the output relations. However, the Prolog-like
languages allow only some assertions: those assertions which are
equivalent to the "insert until nothing new comes" interpretation.
|
Example 11-10.
It is known that several items have more components than the
item CAR. Display one of those items.
- exists output-item in OUTPUT-ITEM:
- (count car-component where `Car' CONTAINS: car-
component) <
- (count item-component where output-item CONTAINS
item-component)
|
The aggregate functions, like the function count in the above example,
are not strictly necessary in the language. Every query can be
expressed without aggregate
functions. The aggregate function can be regarded as convenient
abbreviations for nonaggregate constructs.
The following is an informal description of the semantics of a query
in this language. This is also a description of the semantics of the
Prolog-like languages (since every Prolog-like program can be regarded
as an assertion, the Prolog-like programs are a subset of all queries
expressed by assertions).
The last case appears in nondeterministic queries - the queries in
which the user does not wish to bother to specify what exact output
should be received, but only specifies some requirements to be
satisfied by the output.
|
Example 11-11.
Display the last name of one student.
- exists s in STUDENT:
- s.LAST-NAME is an OUTPUT-STUDENT-NAME
|
The capability of nondeterministic specification saves the user's
effort and also allows for a greater optimization potential.
|
Example 11-12.
In the above example, the system will fetch one student which
happens to the first in the physical access path to the
database. Had the user specified a specific student, it would
take longer to deliver that student from the database.
|
This language has no limitation of expressive power - every query that
can be programmed in any procedural data manipulation language also
can be specified nonprocedurally in
Rishe's language.
While the language can express any query, this generality might also
allow for unreasonable queries, that is, queries which would not make
any sense.
|
Example 11-13.
A query to find the average social security number of two
persons makes no sense.
A query to find the average between two persons, as if they
were integers, makes even less sense.
|
It may be desirable to prevent the user from asking such queries. Such
a constraint would both eliminate some user errors and improve the
efficiency of the system. An interesting feature of the language is
the capability to restrict itself to the reasonable queries by
accepting criteria of reasonability as a parameter. When such criteria
are given as a parameter, those queries which are unreasonable
according to the criteria are syntactically screened out. The
criteria are defined in terms of the scalar operators which make sense
in different concrete categories.
|
Example 11-14.
The operators meaningful on the final grades are +, -, <, >,
and so on. There are no operators except the equality
verification (=, =) on the IDs of the students. Constants of
the type STUDENT-ID are allowed. (Unlike that, there are no
constants in the abstract categories.)
|
The sublanguage restricted
according to the reasonability criteria can express every reasonable
query that can be programmed in any procedural language.
The language can also be used to specify integrity constraints,
inference rules, userviews, and update transactions. An update
transaction can be specified as two sets of facts: a set of old facts
to remove from the database and a set of new facts to insert into the
database. Those two sets of facts are extracted from an output of a
query. The output of a query (specified as an assertion) may contain
relations marked with a suffix -insert. The facts of those relations
are to be inserted. The facts of the relations marked with the suffix
-delete are to be deleted.
|
Example 11-15.
Remove all the grades.
- for every e in ENROLLMENT:
|
e
|
FINAL-GRADE-delete
|
s.FINAL-GRADE
|
|
|
Example 11-16.
Create a new student Veronica. Assume that the category
EXISTING-OBJECT is the supercategory of all the categories.
- exists s in STUDENT-insert:
- not (s is an EXISTING-OBJECT) and
|
s
|
FIRST-NAME-insert
|
`Veronica'
|
- (The condition `not (s is an EXISTING-OBJECT)' can be
stated implicitly.)
|
|
Example 11-17.
Enroll every student in the Databases course. It is assumed
that at least one offering of the course exists.
- for every s in STUDENT:
- exists e in ENROLLMENT-insert:
- not (e is an EXISTING-OBJECT) and
- e.THE-STUDENT-insert=s and
- e.THE-OFFERING-insert.THE-COURSE.NAME = `Databases'
|
A problem with this language is that there is no practical efficient
implementation for the language. Nevertheless, the language is useful
for the following purposes:
- As a high-level language in which to write the specification of a
problem before that specification is translated into a lower-
level program in the language supported by the actual DBMS
- As a language model from which sublanguages can be derived and
efficiently implemented
- As a tool to compare and evaluate the power of practical database
languages
- As a tool to reason about databases
The Predicate Calculus languages may be unfriendly to the
unsophisticated user. However, user-friendly "syntactic sugar" can,
and has been, added to some Predicate Calculus languages to enhance
their usability. This "sugar" can range from simple syntactic
abbreviations to menu-driven languages and to natural language
interfaces in which the user can enter a query in what might look like
plain English or Swahili.
An interesting user interface to a Relational Calculus-based language
is the Query-By-Example language developed by M. Zloof and now used
commercially. In this language, the user specifies a query by
drawing, with the system's assistance, tables on a two-dimensional
screen.
|
Example 11-18.
The following table will be an on-the-screen specification of
the query
- `Print the names and the seasons of the course
offerings prior to 1900.'
|
COURSE OFFERING
|
INSTRUCTOR-ID
|
COURSE-NAME
|
YEAR
|
SEASON
|
|
|
print
|
< 1990
|
print
|
|
Example 11-19.
The following table will be an on-the-screen specification of
the query
- `Print the names and the seasons of the course
offerings by the instructors who also taught
Databases.'
This query uses a variable _dbinstructor in order to specify a
relationship (join) between different rows of the table, so
that the related rows have the same value in the column
INSTRUCTOR-ID. The variables in this language are preceded by
an underscore (_).
|
COURSE OFFERING
|
INSTRUCTOR-ID
|
COURSE-NAME
|
YEAR
|
SEASON
|
|
_dbinstructor
|
print
|
< 1990
|
print
|
|
_dbinstructor
|
Databases
|
|
|
Some Prolog-based languages have user-friendly interfaces to subsets
of the languages. The Rishe language described in the previous section
also has a user-friendly interface which can be used only for
intermediate specifications of data manipulation tasks - it is not
used to write the actual programs since there is no efficient
implementation.
Reference Schemas
The following are the semantic and relational schemas for the
university case study application. These schemas are referred to in
most of the examples in this text.

Figure Ref-1. A semantic schema for a university
application.

Figure Ref-2. A relational schema for the university
application.