Chapter 11
*Fifth-Generation Languages

This chapter discusses several fifth-generation languages. Sections 1-3 address issues of expressive power of logic-based database languages and discuss Prolog-like languages and a logic-based language which attains computational completeness. Section 4 discusses user- friendly interfaces, using the Query-By-Example language as an example.

The optional sections of Chapter 3 are prerequisite to reading this chapter.

11.1. Limitations of Nonprocedural Database Languages

Not every query can be specified in the languages based on Predicate Calculus.

Example 11-1.

A person x can improve grades of a person y if x teaches y or x teaches a person who can improve grades of y. A query to find whether x can improve grades of y cannot be specified in the calculus.

Example 11-2.

Consider the following relational subschema of the bill of materiel of items. Each row in the table contains the names of two items where one item has the other as its immediate component.

COMPONENT

contained-item: String
containing-item: String

We cannot specify in Predicate Calculus or in SQL or in the Relational Algebra the following query:
`Print a list of all the pairs of items which directly or indirectly contain one another.'

The queries in the examples above are recursive queries. Some unexpressible recursive queries would become expressible if we extend the calculus with additional constructs, such as the so-called transitive closure operator or the more powerful fixed-point operator. But no matter what additional constructs we add to the Predicate Calculus, it will still be possible to encounter a query which cannot be specified in the extended language. A solution to this problem will be suggested in a later section.

11.2. Prolog-Like Languages

In the section on the calculus for transactions we have used an insert operation with the following syntax:

insert into category (relation1 : expression1 , ... , relationk : expressionk) where condition
Let us consider now a program which iteratively performs a set of insert statements and terminates when there is nothing more to insert.
Example 11-3.

Let ITEM-NAME be a concrete category of strings.

The following program augments the table COMPONENT by all the pairs of items which directly or indirectly contain one another.

repeat

insert into COMPONENT

(CONTAINING-ITEM: containing,

CONTAINED-ITEM: contained)

where

exists intermediate in ITEM-NAME:

COMPONENT (CONTAINING-ITEM: containing, CONTAINED-ITEM: intermediate) and

COMPONENT (CONTAINING-ITEM: intermediate, CONTAINED-ITEM: contained)

until nothing new has been inserted in the last iteration

A Prolog-like database program is an equivalent of a program with one repeat loop enclosing several insert statements:

repeat

insert-statements

until nothing new has been inserted in the last iteration

(We interpret the insert operation as adding only information which is not already there.)

Normally, Prolog-like languages allow only very primitive conditions within the insert statements: they do not allow the quantifier "for every" or complex expressions within those conditions. In exchange for this limitation, the Prolog-like languages can perform the program by a much faster algorithm than the obvious straightforward implementation of the loop.

A Prolog-like program can be used for retrieval of information from the database if instead of inserting new objects into the existing categories in the database we perform insertion into output tables.

Example 11-4.

The following program will produce a table OUTPUT-COMPONENT which will be composed of all the pairs of items which directly or indirectly contain one another. The input is the original table COMPONENT, which is composed of the pairs of items directly containing one another.

repeat

insert into OUTPUT-COMPONENT

(CONTAINING-ITEM: containing,

CONTAINED-ITEM: contained)

where

COMPONENT (CONTAINING-ITEM: containing, CONTAINED-ITEM: contained);

insert into OUTPUT-COMPONENT

(CONTAINING-ITEM: containing, CONTAINED-ITEM: contained)

where

exists intermediate in ITEM-NAME:

COMPONENT (CONTAINING-ITEM: containing, CONTAINED-ITEM: intermediate) and

OUTPUT-COMPONENT (CONTAINING-ITEM: intermediate, CONTAINED-ITEM: contained)

until nothing new has been inserted in the last iteration

(* The first insert statement will be performed only once. *)

The above description of Prolog-like languages has a strong procedural flavor because of the ``repeat . . . until'' loop. We can describe a Prolog-like program nonprocedurally - as an assertion which links the relations of the input database and the relations to be produced as the output of the query.
Example 11-5.

The above query can be nonprocedurally regarded as the following assertion:

for every containing in ITEM-NAME:

for every contained in ITEM-NAME:

if COMPONENT
(CONTAINING-ITEM: containing, CONTAINED-ITEM: contained)

then OUTPUT-COMPONENT

(CONTAINING-ITEM: containing, CONTAINED-ITEM: contained)

and

for every containing in ITEM-NAME:

for every contained in ITEM-NAME:

for every intermediate in ITEM-NAME:

if

COMPONENT (CONTAINING-ITEM: containing, CONTAINED-ITEM: intermediate) and

OUTPUT-COMPONENT (CONTAINING-ITEM: intermediate, CONTAINED-ITEM: contained)

then OUTPUT-COMPONENT

(CONTAINING-ITEM: containing, CONTAINED-ITEM: contained)

The pragmatic meaning of such an assertion is:

Output a set of tuples which, when regarded as a
table OUTPUT-COMPONENT, would make the above
assertion come true. Do not output any extra tuples
which are not needed to make the assertion true.

The Prolog-like languages described above are sometimes referred to as fifth-generation database languages, although this term can be broadly applied to all powerful nonprocedural database languages, particularly the languages based on Predicate Calculus.

The actual syntax of Prolog-like languages is usually somewhat different from the ``insert'' notation shown above, but it is equivalent to that notation. The loop specification is omitted. The quantifiers ``exists'' in the conditions are implicit. (No ambiguity arises since the ``for every'' quantifier is normally not allowed as a part of the insert conditions in Prolog-like languages.)

Example 11-6.

The above program would be written as follows in some Prolog-like languages:

OUTPUT-COMPONENT(CONTAINING-ITEM: containing, CONTAINED-ITEM: contained) <-

COMPONENT (CONTAINING-ITEM: containing, CONTAINED-ITEM: contained);

OUTPUT-COMPONENT (CONTAINING-ITEM: containing, CONTAINED-ITEM: contained) <-

COMPONENT (CONTAINING-ITEM: containing, CONTAINED-ITEM: intermediate),

OUTPUT-COMPONENT (CONTAINING-ITEM: intermediate, CONTAINED-ITEM: contained)

The expressive power of Prolog-like database languages, that is, the ability of these languages to express a wide range of queries, depends, in part, on what is allowed to appear on the right side of the <- statement. However, even if we allow arbitrary first-order Predicate Calculus assertions on the right side of the statements, there would still be many reasonable queries which cannot be specified in the Prolog-like database languages without sacrificing the nonprocedurality of the language. The cause of such limitation is the limit to what one can do with the "insert until nothing new comes" construct.

11.3. The Maximal Expressive Power

A nonprocedural language more powerful than the Prolog-like database languages has been proposed by N. Rishe.

In this language, a query is specified simply as an assertion about the relations to appear in the output, linking those relations to the information in the input instantaneous database.

Example 11-7.

The previously considered example on page ____, which was an assertion equivalent to a Prolog program producing the table of the pairs of components indirectly containing one another, is also an example of a query specification in the language discussed in this section.

Example 11-8.

The following is an assertion stating that the output shall display the item which indirectly contains all the items (if such a superitem exists).

We assume that the userview has

  • contains - relation from ITEM-NAME to ITEM-NAME (m:m)
The output is the category SUPERITEM-OUTPUT. This category will have at most one object - the superitem. In the formulation of the query we define a temporary relation INDIRECTLY-CONTAINS. The assertion states that this temporary relation relates all the pairs of items indirectly containing one another. (This includes an item containing itself.) This temporary relation will not become a part of the output.

for every item in ITEM-NAME:

item INDIRECTLY-CONTAINS item

and

for every containing in ITEM-NAME:

for every contained in ITEM-NAME:

for every intermediate in ITEM-NAME:

if containing CONTAINS intermediate and intermediate INDIRECTLY-CONTAINS contained
then containing INDIRECTLY-CONTAINS contained

and

for every si in ITEM-NAME:

if

(for every item in ITEM-NAME:

si INDIRECTLY-CONTAINS item)

then si is a SUPERITEM-OUTPUT

The pragmatic meaning of such an assertion is:

Create an instantaneous relation INDIRECTLY-CONTAINS and an instantaneous category SUPERITEM-OUTPUT, which would make the above assertion come true. Output the SUPERITEM-OUTPUT category and discard the temporary relation INDIRECTLY-CONTAINS. Do not output any extra objects or relationships which are not needed to make the assertion true.

Example 11-9.

The following is an assertion specifying the relation CAN-IMPROVE-THE-GRADES-OF.

for every s in STUDENT:

for every i in INSTRUCTOR:

(if i TAUGHT s

then i CAN-IMPROVE-THE-GRADES-OF s) and

for every middleman in STUDENT:

if i TAUGHT middleman and middleman CAN-IMPROVE-THE-GRADES-OF s
then i CAN-IMPROVE-THE-GRADES-OF s

In a sense, this language is a superset of the Prolog-like database languages. A Prolog-like database program can also be regarded as an assertion about the output relations. However, the Prolog-like languages allow only some assertions: those assertions which are equivalent to the "insert until nothing new comes" interpretation.

Example 11-10.

It is known that several items have more components than the item CAR. Display one of those items.

exists output-item in OUTPUT-ITEM:

(count car-component where `Car' CONTAINS: car- component) <

(count item-component where output-item CONTAINS item-component)

The aggregate functions, like the function count in the above example, are not strictly necessary in the language. Every query can be expressed without aggregate functions. The aggregate function can be regarded as convenient abbreviations for nonaggregate constructs.

The following is an informal description of the semantics of a query in this language. This is also a description of the semantics of the Prolog-like languages (since every Prolog-like program can be regarded as an assertion, the Prolog-like programs are a subset of all queries expressed by assertions).

The last case appears in nondeterministic queries - the queries in which the user does not wish to bother to specify what exact output should be received, but only specifies some requirements to be satisfied by the output.

Example 11-11.

Display the last name of one student.

exists s in STUDENT:

s.LAST-NAME is an OUTPUT-STUDENT-NAME

The capability of nondeterministic specification saves the user's effort and also allows for a greater optimization potential.

Example 11-12.

In the above example, the system will fetch one student which happens to the first in the physical access path to the database. Had the user specified a specific student, it would take longer to deliver that student from the database.

This language has no limitation of expressive power - every query that can be programmed in any procedural data manipulation language also can be specified nonprocedurally in Rishe's language.

While the language can express any query, this generality might also allow for unreasonable queries, that is, queries which would not make any sense.

Example 11-13.

A query to find the average social security number of two persons makes no sense.

A query to find the average between two persons, as if they were integers, makes even less sense.

It may be desirable to prevent the user from asking such queries. Such a constraint would both eliminate some user errors and improve the efficiency of the system. An interesting feature of the language is the capability to restrict itself to the reasonable queries by accepting criteria of reasonability as a parameter. When such criteria are given as a parameter, those queries which are unreasonable according to the criteria are syntactically screened out. The criteria are defined in terms of the scalar operators which make sense in different concrete categories.

Example 11-14.

The operators meaningful on the final grades are +, -, <, >, and so on. There are no operators except the equality verification (=, =) on the IDs of the students. Constants of the type STUDENT-ID are allowed. (Unlike that, there are no constants in the abstract categories.)

The sublanguage restricted according to the reasonability criteria can express every reasonable query that can be programmed in any procedural language.

The language can also be used to specify integrity constraints, inference rules, userviews, and update transactions. An update transaction can be specified as two sets of facts: a set of old facts to remove from the database and a set of new facts to insert into the database. Those two sets of facts are extracted from an output of a query. The output of a query (specified as an assertion) may contain relations marked with a suffix -insert. The facts of those relations are to be inserted. The facts of the relations marked with the suffix -delete are to be deleted.

Example 11-15.

Remove all the grades.

for every e in ENROLLMENT:

e

FINAL-GRADE-delete s.FINAL-GRADE

Example 11-16.

Create a new student Veronica. Assume that the category EXISTING-OBJECT is the supercategory of all the categories.

exists s in STUDENT-insert:

not (s is an EXISTING-OBJECT) and
s FIRST-NAME-insert `Veronica'

(The condition `not (s is an EXISTING-OBJECT)' can be stated implicitly.)

Example 11-17.

Enroll every student in the Databases course. It is assumed that at least one offering of the course exists.

for every s in STUDENT:

exists e in ENROLLMENT-insert:

not (e is an EXISTING-OBJECT) and

e.THE-STUDENT-insert=s and

e.THE-OFFERING-insert.THE-COURSE.NAME = `Databases'

A problem with this language is that there is no practical efficient implementation for the language. Nevertheless, the language is useful for the following purposes:

11.4. User-Friendly Interfaces

The Predicate Calculus languages may be unfriendly to the unsophisticated user. However, user-friendly "syntactic sugar" can, and has been, added to some Predicate Calculus languages to enhance their usability. This "sugar" can range from simple syntactic abbreviations to menu-driven languages and to natural language interfaces in which the user can enter a query in what might look like plain English or Swahili.

An interesting user interface to a Relational Calculus-based language is the Query-By-Example language developed by M. Zloof and now used commercially. In this language, the user specifies a query by drawing, with the system's assistance, tables on a two-dimensional screen.

Example 11-18.

The following table will be an on-the-screen specification of the query

`Print the names and the seasons of the course offerings prior to 1900.'

COURSE OFFERING

INSTRUCTOR-ID COURSE-NAME YEAR SEASON
print < 1990 print

Example 11-19.

The following table will be an on-the-screen specification of the query

`Print the names and the seasons of the course offerings by the instructors who also taught Databases.'
This query uses a variable _dbinstructor in order to specify a relationship (join) between different rows of the table, so that the related rows have the same value in the column INSTRUCTOR-ID. The variables in this language are preceded by an underscore (_).

COURSE OFFERING
INSTRUCTOR-ID COURSE-NAME YEAR SEASON
_dbinstructor print < 1990 print
_dbinstructor Databases

Some Prolog-based languages have user-friendly interfaces to subsets of the languages. The Rishe language described in the previous section also has a user-friendly interface which can be used only for intermediate specifications of data manipulation tasks - it is not used to write the actual programs since there is no efficient implementation.

Reference Schemas
The following are the semantic and relational schemas for the university case study application. These schemas are referred to in most of the examples in this text.


Figure Ref-1. A semantic schema for a university application.


Figure Ref-2. A relational schema for the university application.