The Evolution Of Data Models And Approaches To Persistence In Database Systems

The Evolution Of Data Models And Approaches To Persistence In Database Systems

Abstract

In this essay I will present the most common approaches on DBMSs and give a quick introduction to each of them, while trying to put them into the correct context of history. I will especially concentrate on object-oriented DBMSs and object-relational DBMSs, while orthogonal persistent language approach will be briefly touched. A discussion of the difference of the approaches, and future options will be given. The challenges of a multi-user run-time environment, e.g. locking, transactions and recovery mechanisms, will not be discussed.

The reader is expected to be familiar with object-oriented development, DBMSs, programming and programming languages in general.

Introduction

Computer science has evolved a lot since its early beginning in the late 1940s. The first non-proprietary programming language was Cobol and with Cobol, and later Fortran, programming became the foundation of creating enterprise computer systems. The systems developed, needed to store its data somewhere and the programmers designed more or less proprietary and specialised solutions for this purpose. In 1964 the first commercial database management system (DBMS) was born; IDS - Integrated Data Store, developed at General Electric, based upon an early network data model developed by C.W.Bachman (Bachman 1965). In the late 1960s, IBM and North American Aviation (later Rockwell International) developed IMS - Information Management System, and its DL/1-language. This was the first commercial hierarchical DBMS. Both kinds of DBMSs (hierarchical and network) were accessible from the programming language (usually Cobol) using a low-level interface. This made the task of creating an application, maintaining the database as well as tuning and development controllable, but still complex and time-consuming.

In 1970 Edgar F. Codd published an article which offered a fundamentally different approach (Codd 1970). Codd suggested that all data in a database could be represented as a tabular structure (tables with columns and rows, which he called relations) and that these relations could be accessed using a high-level non-procedural (or declarative) language. Instead of writing algorithms to access data, this approach only needed a predicate that identified the desired records or combination of records. This would lead to higher programmer productivity and in the beginning of the 1980s several Relational DBMS (RDBMS) products emerged (e.g. Oracle, Informix, Ingres and DB2).

As the DBMSs evolved, so did the programming languages. In 1967 Simula, the first object-oriented programming language, was born. Simula was developed to make a foundation to develop simulation programs, and contained the now familiar class-concept. Several other programming languages adopted the class-concept from Simula (e.g. C++, Java, Eifel, and Smalltalk) and continued to evolve more or less independently of the DBMSs.

In the early 1980s research started on another kind of database. This research was among other things, motivated by the need of a database system capable of handling complex objects and structures like those used in CAD systems, CASE and OIS systems (Zdonik. 1994). To accomplish these tasks the database had to be able to store classes and objects and the objects associations and methods, and the object-oriented DBMS (OODBMS) emerged. In the late 1980s several vendors had developed OODBMSs (e.g. ObjectDesign, Versant, O2 and Objectivity).

OODBMSs were no threat in the late 1980s to the now big commercial vendors developing and selling hierarchical, network or relational databases. In 1991 ODMG (Object Database Management Group) was founded, mainly thanks to Rick Cattell of JavaSoft, and in 1993 several vendors of OODBMSs agreed upon an OODBMS standard called ODMG-93. The relational databases already had its standard - SQL-92, defined by its ANSI committee and ISO. And so did the network database vendors as well; CODASYL (defined in 1986 by the ANSI X3H2 comittee).

The founding of ODMG and the fact that object-oriented programming languages became more and more used may well have been the major driving forces when the ANSI X3H2 committee started its work on SQL3 in 1992. This proposal put another type of DBMS on the arena - the object relational DBMS (ORDBMS).

While all this was happening, more and more programmers converted from C and other languages to C++. C++ was becoming the most used object-oriented language, but C++ application were not always that easy to develop and maintain. Such applications often had memory-leaks, erroneous pointers and other trivial problems attached to them.

In 1991 Sun's Green Team started the development of a new programming language which was loosely based on C++. The language was named Oak after the trees outside the office window of the language designer - James Gosling. In 1992 Sun turned Green Team into a fully owned company, called First Person Inc. National Center for Supercomputing introduced Mosaic in 1993, a WWW browser, and the Internet began to bustle with traffic. Soon other WWW browser followed. In 1994 First Person built an Oak-ready browser called WebRunner and Sun backed the decision to give the language (Oak) away for free, but first Oak was renamed to Java and WebRunner to HotJava. Java became available to millions of people due to Netscapes bundling of Java, and soon others followed (Bank 1995).

Since the late 1970s M.P. Atkinson had been working on research into databases, persistence and their applications, and he was one of the major contributors to the first orthogonally persistent language, PS-algol (Atkinson et al. 1983). The term and concept, "Orthogonal persistence", will be explained later. Atkinson has continued his work and is currently working on the PJama-project at University of Glasgow. PJama is an orthogonally persistent version of Java. Today Java is by many regarded as the "hottest" language, and OODBMSs are by many the "hottest" DBMSs as well.

Database concepts.

As history shows, conceptually different DBMSs have been developed, and these DBMSs have different capabilities, both regarding organisation of data, modelling of data and access to data. In this section I will dive a little further into the different DBMSs and especially concentrate on ORDBMSs and OODBMSs. I will also present the recent academic approaches towards persistent languages and explain what orthogonal persistence is, but first I will explain the concepts of hierarchical, network and relational DBMSs.

Hierarchical

Hierarchical DBMSs became commercial available in the late 1960s with IBMs IMS and the DL/1-language. The hierarchical DBMSs organise and model their data in a hierarchical fashion as a collection of trees. All data elements have an owner or root (one and only one) - the hierarchical mother of the data-instance, and accordingly all other records have a unique parent record. Finally, the access to data in a hierarchical database is done by using low-level calls which the programmer writes into his/hers programs for access purposes using a navigational language, navigating from the root records to the actual record of interest (Silberschatz et al. 1991). To be able to do this, the programmer must know the physical representation of the database.

This approach is well suited for large systems containing a lot of data where the data can be organised in a hierarchical way without compromising the information. Hierarchical databases support two types of information, the record type which is a record containing data, and parent-child relations (PCR) which defines a 1:N relationship between one parent record and N child-records. Hierarchical databases are still used in many systems and IMS is still the leading hierarchical DBMS. On the other hand, this approach has several major limitations due to its representation of data. It is not trivial to represent a non-hierarchical structure of information in such a database (Elmasri & Navathe 1994).

Network

C.W. Bachman developed the first commercial network DBMS database (Bachman 1965). The system - IDS - was available in 1964. The modelling paradigm of the network database is somewhat different than its hierarchical counterpart. The network databases arranges its data as a directed graph and has a standard navigational language (DBTG 1971). This paradigm made it possible to move directly from one specific entry point in a data set to another record in another data set (Silberschatz et al.1991).

The network databases offers an efficient access-path to its data and is capable to represent almost any informational structure containing simple types (e.g. integers, floats, strings and characters). This is accomplished using different kinds of mapping mechanisms called sets. A set is a container of pointers identifying which sets of data can be reached from the current record. Three sets are defined by the CODASYL standard - singular/system sets, multimember sets and recursive sets. Using these sets, the database designer and programmer may represent and navigate on 1:1, 1:N and N:M relationships (Elmasri & Navathe 1994). To be able to do this, the programmer has to know the physical representation of the database and access the database using a low-level navigational language (Bachman 1973). This approach to DBMS is more flexible than the hierarchical approach, but still the programmer has to know the physical representation of data to be able to access it, and accordingly applications using a network database has to be changed every time the structure of the database changes.

Relational

The relational data-model was first represented by Edgar F. Codd in (Codd 1970) and later in a series of papers up to 1972 (Codd 1971a, Codd 1971b, Codd 1971c, and Codd 1972). The model offers a conceptually different approach to data storage. In the relational database, all data is represented as simple tabular data structures (relations) which may be accessed using a high-level non-procedural language. This language is used to gain access to the relations and the desired set of data and the programmer does not have to write algorithms for navigation. By using this approach the physical implementation of the database is hidden, thus the programmer does not have to know the physical implementation to be able to access the data. In 1974 Chamberlain and others at IBM proposed such a high-level non-procedural language - SEQUEL, which later, due to legal problems, was renamed into SQL (Date 1990). In 1986 the ANSI X3H2 committee accepted SQL as an ANSI standard, and soon SQLs biggest competitor, QUEL, vanished.

The relational approach separates the program from the physical implementation of the database, making the program less sensitive to changes of the physical representation of the data by unifying data and metadata in the database. This makes the development of programs more effective and less dependent on changes in the physical representation of data (Gray 97).

SQL and RDBMSs have become widely used due to the separation of the physical and logical representation (and marketing of course). It is much easier to understand tables and records than pointers and pointer?s pointers to records. Most significantly, research showed that a high-level relational query language could give performance comparable to the best record-oriented databases (Gray 97).

Object-relational

The object-relational DBMS (ORDBMS) is the newest commercial breed of DBMSs which embraces some object-oriented features and encapsulate these features into an RDBMS, creating an ORDBMS. ORDBMSs are mainly based upon the criteria defined by Stonebraker et al.(1990). This manifesto is built as an opposing manifesto to Atkinson et al. (1989). Stonebraker et al. suggests to extend the capabilities of an RDBMS to include support for richer object structures and rules and still be open to other systems. This is done by thirteen propositions of requirements to ORDBMSs as extensions to RDBMS.

The first tenet in (Stonebraker et al. 1990) is concerned with richer objects and rules. This includes inheritance, advanced data type (ADT), and a number of different constructors, e.g. sets, lists and bags, needed to operate on objects or collections of objects. Stonebraker et al. (1990) suggest that all functions which involves data should be written in a high-level, non-procedural access language (HLL) to avoid programming towards low-level interfaces being dependent on physical implementation of the database.

The second tenet is concerned with the increasing DBMS functions, how the functions should be written and how they should access data. Stonebraker et al. (1990) state that essentially all programmatic access to DBMS should be using the HLL while ensuring data independence. This should be accomplished by including updateable views, enumeration of members in collections combined with the HLLs capabilities to specify membership, and by avoiding all kinds of low-level access dependent on the physical implementation. The ORDBMS should also, according to Stonebraker et al. (1990), keep a backward compatibility to RDBMS, making porting from RDBMS to ORDBMS easy.

The third tenet concerns openness and the ORDBMSs application-programming interface (API). Stonebraket et al. (1990) put forward the idea that ORDBMSs should be accessible from multiple HLLs, and that all of these HLLs should be based upon SQL. Persistent programming languages should be supported on top of a single DBMS by compiler extensions to the programming language, and a runtime-system.

The ORDBMS camp is however split. Darwen & Date (1995) have written an opposing manifesto to Stonebraker et al. (1990) and Atkinson et al. (1989) suggesting a completely different set of guidelines to the future DBMSs. First of all, Darwen & Date, state that SQL is not suited as the base language for the future DBMSs (they feel strongly that "any attempt to move forward, if it is to stand time, must reject SQL unequivocally"). They claim that the foundation of the DBMSs must be firmly rooted in the relational model of data, that the relational model is capable to include object-oriented features and that a new HLL has to be developed (and SQL omitted).

Darwen & Date and Stonebraker et al. agree on several important issues (e.g. use of the relational model and development of a new HLL), but disagree on other (e.g. SQL and multiple inheritance).

In 1992 the ANSI X3H2 committee started its work on a new object-relational standard SQL3 (SQL3 is the ORDBMS standard) which was scheduled to be finished in 1995. The committee is still working on it and the expected timeframe for completion is currently 1999 (JCC 1998).

Since no standard currently exists, no vendors can truthfully claim to be SQL3 compliant, but some vendors have done major changes to their RDBMS to include object-relational features (e.g. ORACLE, IBMs DB2, and Informix). The ORDBMS models its data based upon the extensions to the relational model and relational calculus. ORDBMS is capable of storing ADTs or basal types in records,lists, bags etc. and can access the data using an extended version of SQL. The result - some kind of hybrid DBMS, half relational, half object-oriented. Whether or not this is a good idea, is not the subject for this essay.

Object-oriented

The conceptual paradigm of the object-oriented DBMS (OODBMS) is quite different from the other approaches presented. OODBMSs offer persistence to objects, including the object?s associations and methods. Atkinson et al. (1989) gave a guideline on the requirements to an OODBMS, and this manifesto is still a very good guideline on OODBMS. In 1993 ODMG put forward its ODMG-93, and later in 1997 ODMG 2.0 standard which today is the de facto standard for OODBMSs (Cattell 1994 and Cattell et al. 1997). Before I look deeper into the concepts of the OODBMS, orthogonal persistence has to be defined and understood.

Orthogonal persistence

Several definitions of orthogonal persistence have been proposed (e.g. Atkinson et al. 1983, and Atkinson & Morrison 1995). These definitions vary somewhat, but the definition given by Atkinson & Morrison (1995) express the term precisely and is the definition of the term I will refer to as orthogonal persistence:

Quote.

The Principle of Persistence Independence
The form of a program is independent of the longevity of the data that it manipulates. Programs look the same whether they manipulate short-term or long-term data.
The Principle of Data Type Orthogonality
All data objects should be allowed the full range of persistence irrespective of their type. There are no special cases where objects are not allowed to be long-lived or are not allowed to be transient.
The Principle of Persistence Identification
The choice of how to identify and provide persistent objects is orthogonal to the universe of discourse of the system. The mechanism for identifying persistent objects is not related to the type system.

The application of these three principles yields Orthogonal Persistence.

Unquote.

This definition is pinpointing the actual differences between an object-oriented language supporting orthogonal persistence and some other DBMS. The OODBMS (a property of the object-oriented orthogonal persistent language) has to be interfaced to the programming language in such a manner that any program, without making any changes, may hold short-term data (exists only while a program is running) or long-term data (outlives the program, on file or in database). In other words, the program should not be changed to offer its objects persistence, resulting in a total removal of the impedance mismatch. Thus a traditional OODBMS (not supporting orthogonal persistence) offering object-persistence will retain some of the impedance mismatch. The different aspects of orthogonal persistence and impedance mismatch will be discussed later in the section "Persistence strategies and the impedance mismatch".

Object persistence, including orthogonal persistence, is often done using the persistence by reachability concept, which makes an object persistent if it is reachable from a persistent root. This is a completely different approach than the network/hierarchical approach using a low-level navigational language (Silberschatz et al. 1991) and it is also different from the RDBMS/ORDBMS approach using a HLL language for navigation, query and manipulation or SQL combined with some data definition language (DDL).

In an OODBMS anything represented as an object, or part of an object, may be stored in the database, regardless of type. All DBMSs, except the ORDBMSs, are only able to handle simple types and in some cases simple objects (e.g. BLOBs) or collections. The ORDBMS data model defines limitations to its ability to model data due to its organisation of tables. Darwen & Date (1995) and Stonebraker et al. (1991) agree that the relational model and calculus has to be the foundation of the next generation of database systems. The OODBMSs lack a common data-model like the ORDBMS/RDBMS and some consider this a major weakness of the OODBMSs (e.g. Darwen & Date 1995 and Stonebraker et al. 1991), while others look upon it from a different perspective - OODBMSs are new and only research into the area will give the answer (e.g. Beeri 1990 and Maier 1989).

OODBMS Concepts

Currently there is no consensus on an object-oriented data modell (OODM), but several proposals exists (e.g. Cattell et al. 1997, TM-Language 1996, and Sarkar & Reiss 1992). All of these models represent actual objects and their associations as data-models, and should not be confused with analysis- and design-methods like OMT by Rumbaugh et al. (1991), OOA/OOD by Coad & Yourdon (1991a, 1991b), the Booch-method by Grady Booch (1994), OOram by Reenskaug et al. (1995), etc. The OODMs differ on several issues, e.g. multiple versus single inheritance, and whether the operations/methods or only the definition should be stored. The sources of these differences are, among other things, caused by the different capabilities of OO-languages (or combination of languages) which should use the OODB. E.g. C++ supports multiple inheritance, so all OODBMSs that support object-persistence and C++ as the programming language must support multiple inheritance in the OODBMS. If the OODBMS supports different programming languages, it probably should store the interface of the methods rather than the methods itself. With this kind of problems, it is very difficult to reach an agreement on all parts of the modelling perspective.

The lack of a commonly agreed data-model and definition language is by some argued to be one of the biggest problems associated with OODM (Darwen & Date 1995), and it is a problem, but it is not a conceptual problem - it is an implementation problem.

Currently ODMG 2.0 is the de facto standard among OODBMSs, and it supports a common object-model with database functionality, an object query language (an SQL look-alike, only including SELECT-statements) - OQL, an object definition language - ODL, and finally a set of bindings to some programming-languages - C++, Java and Smalltalk (Cattell et al. 1997).

I have now presented the most common approaches to DBMS, but another group is also emerging - the persistent programming languages.

Orthogonally persistent programming languages

Persistent programming languages are special breeds of languages where the database and the programming language are merged into one system at runtime (or if possible, at compile time). This approach was first investigated when M.P. Atkinson and others started to develop PS-Algol in the early beginning of the 1980s (Atkinson et al. 1983) and later with the development of Amber (Cardelli 1984), Galileo (Albano et al. 1985), Napier88 (Dearle et al. 1989), and OPAL (Servio 1990). PS-Algol was a research-attempt to include orthogonal persistence and persistence by reachability into a standard programming language, without changing the base language, while trying to keep the extensions to the language at a minimum. This approach is more clean-cut than the general OODBMS approach, avoiding the problems arising when different programming languages are being used to gain access to the database and removes the impedance mismatch. Thus the database and the language, in a fully developed version, have identical properties. These attempts have been based on languages supporting an OO-paradigm, but at least one exception exist; MUMPS.

These approaches have been more academically driven than the other, and in 1996 another attempt was started, an attempt to design a type-safe, object-oriented, orthogonally persistent programming language, based on Java (Atkinson et al. 1996a, 1996b). The research and development has continued and a number of articles related to this programming language, now called PJama, is being published (e.g. Atkinson et al. 1996a, 1996b, Jordan 1996, Spence & Atkinson 1997, and Prentiz et al. 1997).

The persistent programming languages achieve orthogonal persistence using a quite different implementation technique than the more commercial OODBMSs. The binding of the persistence to the objects are done dynamically in runtime by the runtime system, while the commercial OODBMSs achieve this by pre- and/or post-processing the code generated by the compiler, implementing the persistent calls before runtime execution. This difference will be discussed in the following section.

Similarities and differences - ORDBMS, OODBMS and PJama

ORDBMSs are based on the relational model first described by Codd (1970). The relational model models its data as relations (tables), and DBMS meta-data describes each relation (e.g. name, column-names, number of rows and different constraints). The ORDBMSs should be able to represent object-oriented features like inheritance, polymorphism, embedded objects, complex objects, sets, lists, bags etc. in these relations, and Darwen & Date (1995) claim that this is possible if SQL is omitted as access language and a new access language is used. Darwen & Date (1995) strongly disagree with Stonebraker et al. (1991) when the latter state that "For better or worse, SQL is intergalactic dataspeak". Darwen & Date and Stonebraker et al. do agree on the basic foundation of the ORDBMS - the next generation of databases should be based on the relational model, but disagree whether it is possible or not without compromising the OO-paradigms. Darwen & Date claim that all object-oriented constructs may be accommodated in an ORDBMS without extending the relational model while Stonebraker et al. suggest to extend the relational model to adopt some features (e.g. inheritance and complex objects). Neither of these approaches removes the "impedance mismatch" regarding object-oriented languages accessing some database system.

OODBMSs lack this commonly agreed model as recognised by many researchers (Atkinson et al. 1989, TM-Language 1996, and Cattell et al. 1997). The lack of an OODM certainly has another problem attached to it; the lack of a commonly agreed object calculus. The hailers of (O)RDBMSs often use these arguments, but whether they are good arguments could be discussed. In the mid 1980s QUEL and SQL where competing as query languages accessing RDBMSs. QUEL was much closer to the tuple calculus than SQL (Date 1990, Elmasri & Navathe 1994), still SQL was chosen to be the standard query language in 1986. SQL had become the de facto standard and it also became the de jure standard. An identical situation is currently under development concerning the OODBMS, ODMG 2.0 is becoming the de facto standard because the biggest OODBMS vendors commit to it, not because it necessarily is the best. The OODBMSs will not remove the "impedance mismatch" as long as they support several programming-languages (by supporting several programming languages they will no longer be orthogonally persistent), but they will reduce it and in some cases (when orthogonal persistence is acheived) totally remove it (Dittrich 1994).

OQL, the ODMG 2.0 query language, is syntactically very close to SQL3, but differs one some points. E.g. when a SELECT-statement is done in SQL3 the result is a structure containing attributes of objects while in OQL the result will be a set of objects (Melton 1995), and it does not include statements for changing data, insertion and deletion. SQL3 is also bigger than OQL since it contains both a data manipulation language (DML) and a DDL. ODMG has chosen to separate the DDL, called Object Definition Language (ODL), from OQL (Wade 1996b).

The most evident difference between an OODBMS and an ORDBMS, from a programmer?s perspective, will be the difference in the impedance mismatch. The impedance mismatch will be far smaller when complex objects and structures are accessed in an OODBMS than in an ORDBMS. On the other hand, the ORDBMS, with its SQL3, gives more flexibility to the end-user while assuring a forward compatibility from SQL2. The DML part of SQL3 makes it possible to alter data in the database without accessing the data from some application that defines constraints to the data-model (Melton 1996). This makes constraints and integrity important issues to the ORDBMS data-model, while in the OODBMS this can only be done from the application.

PJama, an orthogonal type-safe object-oriented persistent programming language, includes, among other things, an OODBMS. OODBMS and ORDBMS differences and similarities have already been briefly discussed and these differences (and similarities) do apply to PJama. PJama has other properties though. PJama totally removes the impedance mismatch - the only difference between a transient and a persistent object is whether or not it is reachable from a persistent root (Atkinson et al. 1996a). This is not very different from the OODBMS approach already discussed (this is true for Java and Smalltalk versions, but more restrictions to reachability are defined by the C++ binding) (Cattell et al. 1997). In PJama the reachability paradigm is implemented as minor changes of the Java language, while the OODBMS approach (based on ODMG 2.0) makes it possible using special keywords in combination with pre-/post-processing of the byte-code (which updates metadata of the OODBMS). This difference makes the dynamic and static binding of the two approaches different. PJama does its binding in runtime (dynamically) while OODBMSs using the ODMG standard do partly statically (type and class bindings) and partly dynamically (object) bindings.

Persistent strategies and the impedance mismatch.

The commercial OODBMSs that claim compliance with ODMG 2.0, do not fully support orthogonal persistence as defined by Atkinson & Morrison (1995). The ODMG standard defines language bindings for C++, Smalltalk and Java, and provides a common data model as well (ODMG 1998). To access the model from a programming language, language bindings and differences in the language typing systems becomes essential. E.g. C++ and Java are mostly statically typed while Smalltalk is dynamically typed, C++ supports multiple inheritance while Java only support multiple inheritance of interfaces, and the runtime environment of the languages have completely different properties.

ODMG 2.0 states that a database may be accessible from any of these languages. All of these differences have impact on the persistence strategy chosen by one OODBMS vendor (e.g. C++ bindings in ObjectStore uses persistence by instantiation, while the Java binding uses persistence by reachability).

An introduction to the most common persistence strategies is available in Khoshatian (1993).

Persistence by inheritance.

A class inherit persistence capabilities from a pre-defined persistent class.

The object instantiated from this class may either be persistent or transient, but the operation of moving the data between transient and persistent area must be performed explicit (Lausen & Vossen 1997). Thus this strategy does not support orthogonal persistence (identification of persistent objects are closely related to the type system), and it still holds a small impedance mismatch (explicit inheritance is necessary and the operation of moving data between transient and persistent area has to be performed explicitly).

Persistence by instantiation.

An object is made persistent and gets its persistent capabilities when instantiated.

Obviously this strategy does not support orthogonal persistence (making an object persistent is done explicit). Another problem connected to this strategy is what happens if a persistent object references a transient one. This problem may put referential integrity in danger and is often solved by using inverse relationships. By using this kind of relationships, the design of classes and implementation is made more rigid, thus an impedance mismatch is created.

Persistence by reachability.

An object is made persistent when it is reachable from another persistent object.

This is the only persistence strategy that holds promises of a total removal of the impedance mismatch while supporting orthogonal persistence.

The OODBMSs has originated from different camps resulting in conceptual differences in the design of the OODBMSs (Lausen & Vossen 1997). The design of an OODBMS has of course an impact on the strategy chosen when implementing persistence capabilities to a given programming language. For an OODBMS to be able to support different programming languages, some tradeoffs have to be done, and these tradeoffs often result in a small impedance mismatch and incomplete orthogonal persistence (Dittrich 1994).

The only true orthogonally persistent OODBMSs will support persistence by reachability for all language bindings. None of the OODBMS vendors who claims ODMG 2.0 compliance have accomplished this while being fully ODMG 2.0 compliant. E.g. O2 uses persistence by reachability in its C++ binding, but is not compliant with ODMG 2.0 concerning ODL, ObjectStore, uses persistence by instantiation in their C++ binding, and Versant and Objectivity uses persistence by inheritance in their C++ bindings.

The next generation of DBMS, the near future and beyond 2000.

Since no time machine currently exist, we must rely on the past and the current works to predict the future. The languages have evolved from coding binary sequences to procedural languages (e.g. Cobol, Fortran and Pascal), functional programming languages (e.g. APE and ML), object-oriented languages (e.g. Simula, Smalltalk, C++, and Java), commercial 4GLs (Open Ingres, Informix 4GL and PowerHouse 4GL), and other breeds as well, all offering some degree of user-level abstraction. The need of persistent data has evolved from sequential files to structured files, network databases, hierarchical databases, RDBMS, and recently into ORDBMS and OODBMS offering more controlled and flexible storage, interface, and transactional capabilities on complex objects and structures (Wade 1996a).

Programming in general.

As these tools (programming languages, databases, program development environments, standards, etc.) mature, users will begin to appreciate their value and more and more off-the-shelf objects (components) will become available. Today program and database development requires knowledge about translating a problem in an universe of discourse (UoD) to some computer language and/or database. Tomorrow it would probably be a question about choosing the combination of components and integrating them to solve the problem in the UoD. This would eventually result in fewer professional programmers building core-objects, and more people working on the actual problem domain of the UoD using pre-built re-usable objects.

The components will range from simple components (graphical user interface - GUI and database components) to complex components for solving some kind of UoD-specific need.

Information and databases.

Today information and databases are often centralised in some way, though distributed systems do occur. Tomorrow this will probably change drastically. Databases will be distributed and accessed using some kind of intelligent agents that hides implementation issues for the users. The database(s) will be spread on several sites, accessible over Internet (or an Intranet or maybe even a combination of these). The databases will conform to some kind of standard (ODMG or SQL3 presumably), but the standard will be hidden for the user which probably will access these information resources using intelligent agents.

Application architecture.

The first applications to be written used a one-layer architecture. As the databases evolved, so did the application architecture resulting in the now-famous two-layer and three-layer architectures. The three-layer architecture requires a knowledge and understanding of a programming language, the UoD, application design and database implementation. I believe that these layers will disappear (layers will probably be hidden by intelligent agents, reusable components and persistent languages) for the common programmer, resulting in an easier implementation that will be hiding the impedance mismatch if such exists.

Databases and standards.

When databases first became commercially available no standards existed, but in 1971 the CODASYL-standard (sometimes called DBTG-standard) emerged (DBTG 1971). However, the network database community had to wait until 1986 for a formal standard (ANSI X3H2 defined the CODASYL-standard in 1986). In 1986 SQL became a standard for RDBMSs, and in 1993 ODMG put forward its ODMG-93 standard for OODBMSs. The work on the SQL3-standard for ORDBMS started in 1992 and is expected to be completed in 1999. The hierarchical databases and network databases are still used, but more and more legacy systems are being converted or reprogrammed into using some newer DBMS, usually an RDBMS. The RDBMS has shown its strength, but it will also eventually disappear, and the two new DBMS-concepts will become more and more used. The vendors claiming support of SQL3 has become big commercial enterprises and stick to the relational data model and to SQL as DML/DDL. The relational calculus and tuple calculus is the mathematical foundation of the relational data model, and SQL is not as close as it should be to either. The additions in SQL3 makes the differences even bigger, and in the future SQL will, unless another language takes it place, become a problem for the ORDBMS society (Darwen & Date 1995).

OODBMS on the other hand will continue to grab chunks of the market, and as component based development, information hiding and intelligent agents become more and more used, distributed database systems based on OODBMS will grow rapidly and sooner or later the centralised database as it is known today is likely to disappear. Eventually some kind of hybrid database that tries to capture the better of two worlds will be enforced. This has already happened to some degree with the emerging SQL3 standard (a hybrid of relational and object-oriented database systems), and will continue to happen. Another scenario would be that the OODBMS and ORDBMS communities concentrate on different application areas. This is not likely to happen because applications and the need to store complex data structures will increase and both the ORDBMS-vendors and the OODBMS-vendors want a part of it.

Orthogonal persistent languages.

These programming and database languages remove the impedance mismatch, but it is not likely that one programming language will be solely used by an enterprise. The marketing potential of such a language is limited, but the development of orthogonal persistent Java^TM (PJama) may change this. Java (formerly Oak) was almost totally unknown to the community until Netscape bundled it into their Internet browser, and it then became freely available to millions of people. The concept of running the same byte-code on different machines, free distribution, OODBMS capabilities, availability, known language and flexibility is compelling. PJama is an academically driven research attempt (though Sun is a major contributor) and it puts focus on several important issues (e.g. the impedance mismatch and OODBMS concepts). I believe that in the future some extended version of PJama or Java?, with limited OODBMS functionality, will be bundled with some Internet browser, making it available to millions of people for free, which may rock the database world.

Conclusion.

The database management systems have evolved a lot from IMS and up to ORDBMSs, OODBMSs, and orthogonal persistent programming languages like PJama. The size of the impedance mismatch is decreasing, the level of abstraction is getting higher and the architecture of DBMS-applications is changing.

I have given a brief description of the database approaches, trying to put them into the correct context of history and discussed some of the differences and similarities of the approaches. My predictions about the future will have to stand on my own account, though some prominent researchers do, to some extent, agree. The future will make the history as time goes by, and in due time we will know whether my predictions come through or not. I have tried to keep a focus on the conceptual differences and has not commented performance, multi-user environments, distributed database systems, replication mechanisms, proprietary solutions, transaction capabilities, security aspects and frameworks (e.g. CORBA and DCOM/ActiveX). This is done to keep the essay manageable (for my own purpose) and focus on what I find most interesting - the conceptual paradigms of the approaches.

Abbreviations

4GL      4th Generation Language
ACM      Association of Computing Machinery
ADT      Advanced Data Type
API      Application Programming Interface
ANSI     American National Standard Institute
BLOB     Binary Large Object
CAD      Computer Aided Design
CASE     Computer Aided Software Engineering
CODASYL Conference on Data Systems Languages
DB       Database
DBMS     Database Management System
DDL      Data Definition Language
DML      Data Manipulation Language
HLL      High-Level Language
GUI      Graphical User Interface
IBM      International Business Machines
IDL      Interface Definition Language
ISO      International Standard Organisation
OIS      Office Information System
ODL      Object Definition Language
ODMG     Object Database Management Group
OMG      Object Management Group
OODBMS   Object-Oriented Database Management System
OODM     Object-Oriented Data Model
OQL      Object Query Language
ORDBMS   Object-Relational Database Management System
QUEL     Query Language
PCR      Parent Child Relationship
RDBMS    Relational Database Management System
SQL      Structured Query Language
UoD      Universe of Discourse

Bibliography

Albano A, Cardelli L and Orsini R. 1985, "Galielio: A Strongly-Typed Interactive Conceptual
Language", ACM Transactions on databases 10 (2)

Atkinson M P., Bailey P J., Chrisholm K J., Cockshott W P. And Morrison R. 1983,
"An approach to Persistent Programming", Computer Journal 26 (4), 360-365

Atkinson M.P., Banchilhon F., DeWitt D., Dittrich K., Maier D. And Zdonik S. 1989.
"The Object-Oriented Database System Manifesto", ALTAIR Technical Report,
No. 30-89, GIP ALTAIR, LeChesnay, France, (September 1989).

Atkinson M P. and Morrison R. 1995, "Orthogonally Persistent Object Systems",
VLDB Journal 4 (3), 319-401, ISSN: 1066-8888

Atkinson M P, Jordan M J, Daynes L and Spence S. 1996a, "Design Issues for Persistent Java:
a type-safe, object-oriented orthogonally persistent system", Proceedings of POS7,
Cape May, New Jersey (May 1996)

Atkinson M P, Daynes L, Jordan M J, Printezis T and Spence S. 1996b. "An Orthonally
Persistent Java^TM", SIGMOD Record, 25 (4), 68-75, (December 1996)

Bachman C W. 1965. "Integrated Data Store", DPMA Quartely, (January 1965).

Bachman C W. 1973. "The programmer as Navigator", Communications of the ACM,
(November 1973).

Bank D. 1995, "The Java Saga", Wired Magazine 3.12, December 1995

Beeri C. 1990, "Formal Models for Object Oriented Databases", Proceedings of the First
              International Conference on Deductive and Object-Oriented Databases 1989
              (DOOD'89), 405-430, North-Holland/Elsevier Science Publishers, Holland,
              ISBN: 0-444-88433-5

Booch G. 1994, Analysis an Design with Applications, 2^nd edition, Benjamin/Cummings,
Menlo Park, California, ISBN: 0-80535-340-2

Cardelli L. 1984, "Amber", AT&T Bell Labs Technical Memorandum 11271-840924-10TM

Cattell R G G. 1994, Object Database Standard: ODMG-93, Morgan Kaufmann Publishers,
San Fransisco, California, ISBN: 1-558-60302-6

Cattell R G G., Barry D, Bartels D, Berler M, Eastman J, Gamerman S, Jordan D, Springer A,
Strickland H and Wade D. 1997, The Object Database Standard: ODMG 2.0,
Morgan Kaufmann publishers, San Fransisco, California, ISBN: 1-55860-463-4

Coad P, Yourdon E. 1991a, Object-Oriented Analysis, Yourdon Press,
ISBN: 0-13629-981-4

Coad P, Yourdon E. 1991a, Object-Oriented Design, Yourdon Press,
ISBN: 0-13630-070-7

Codd E F. 1970, "A Relational Model for Large Shared Databanks", Communications of the
ACM, 13 (6), 377-390, (June 1970).

Codd E F. 1971a, "Normalized Data Base Structure: A Brief Tutorial", IBM Research Report
RJ 935

Codd E F. 1971b, "Further Normalization of the Data Base Relational Model", IBM Research
Report RJ 909

Codd E F. 1971c, "Data Base Sublanguage Founded on the Relational Calculus", IBM
Research Report RJ 893

Codd E F. 1972, "Relational Completeness of Data Base Sublanguages", IBM Research
Report RJ 987

Date C J. 1990, An introduction to database systems, volume 1, 5th edition, Reading,
Massachusetts, ISBN: 0-201-51381-1

Darwen H, Date C J. 1995, "The Third Manifesto", SIGMOD Record, 24 (1), 39-49,
(March 1995)

DBTG. 1971, "Report of the CODASYL Data Base Task Group", ACM Computing Survey,
(April 1971).

Dearle A, Connor R C H, Brown AL and Morrison R. 1989, "Napier88 - A Database
Programming Language?", Proc. 2nd International Workshop on Database
Programming Languages, 179-195

Dittrich K R. 1994, "Object-Oriented Data Model Concepts", Advances in Object-Oriented
              Database Systems, NATO ASI Series, Series F: Computer and System Science,
              Vol. 130, 29-45, Springer Verlag, Berlin Heidelberg New York,
              ISBN: 3-540-57825-0 / 0-387-57825-0

Elmasri R and Navathe S B. 1994, Fundamentals of database systems, 2^nd edition,
Redwood City, California, The Benjamin/Cummings Publishing Company,
ISBN: 0-8053-1753-8

Gray J N. 1997. Database Systems: A Textbook Case of Research Paying Off,
Microsoft Corporation

JCC Consulting Inc. 1998, SQL Standards Home Page

Jordan M. 1996, "Early Experiences with Persistent Java", First International Workshop
on Persistence and Java (PJW1)

Khoshatian S. 1993. Object-Oriented Databases, John Wiley & Sons, Inc., New York,
ISBN: 0471-57058-3

Lausen S and Vossen G. 1997, Models and languages of Object/Oriented Databases,
Addison-Wesley, Harlow, England, ISBN: 0-201-62431-1

Maier D. 1989, "Why isn't there an object-oriented data model?", Proceedings of the IFIP
11th World ComputerCongress (IFIP'89), 793-798, North-Holland/IFIP, Holland,
ISBN: 0-444-88015-1

Melton J. 1995, "Accomodating SQL3 and ODMG", JCC Consulting Inc,

Melton J. 1996, "An SQL3 Snapshot", Proceedings Twelfth International Conference on
Data Engineering, 666-672, IEEE Computer Society Press, California,
ISBN: 0-8186-7240-4

ODMG. 1998, The standard for Object Database

Prentiz T, Atkinson M, Daynès L, Spence S and Bailey P. 1997, "The Design of a new
Persistent Object Store for PJama", The Second International Workshop on
Persistence and Java (PJW2)

Reenskaug T, Wold P, Lehne O A. 1995, The OOram Software Engineering Method,
Manning Publication Company, Greenwich, ISBN: 1-88477-710-4

Rumbaugh J, Blaha M, Premerlani W, Eddy D and Lorenzn W. 1991, Object-Oriented
Modelling and Design, Prentice Hall, ISBN: 0-1362-9841-9

Servio. 1990, Programming in OPAL, Version 2.0, Servio Logic Development Corporation

Sarkar M and Reiss S P. 1992, "A Data Model for Object-Oriented Databases", Technical
report CS-92-56, Brown University, Department of Computer Science

Spence S and Atkinson M, 1997, "A Scalable Model of Distribution Promoting Autonomy of
              and Cooperation between PJava Object Stores", Proceedings of the Hawaii
              International Conference on System Sciences (HICSS-30), 1 (7),
              IEEE Publication: PR7743-QAJ, ISBN:0-8186-7743-0

Stonebraker M, Rowe L A, Lindsay B, Gray J, Carey M, Brodie M, Bernstein P, Beech D.
1990. "Third-generation Database System Manifesto", SIGMOD Record, 19 (3),
(September 1990)

Silberschatz A, Stonebraker M, and Ullman J. 1991. "Database systems: Achievements and
Opportunities", Communications of the ACM, 34 (10), 110-120, (October 1991)

TM-language, 1996, University of Twente, Department of Computer Science

Wade A. 1996a, "Information and DBMS Trends, Beyond 2000", Objectivity Inc.

Wade A. 1996b, "SQL/OQL Merger", JCC Consulting Inc,

Zdonik S. 1994, "What Makes Object-Oriented Database Management Systems Different",
              Advances in Object-Oriented Database Systems, NATO ASI Series, Series F:
              Computer and System Science, Vol. 130, 3-26, Springer Verlag, Berlin Heidelberg
              New York, ISBN: 3-540-57825-0 / 0-387-57825-0

Java is a registered trademark of Sun Microsystems Inc. in the USA and other countries.