GATE: Data Integrity & Modules
Data integrity has several distinct aspects. The first one is semantic integrity which corresponds to integrity constraints that the DBMS must ensure on the database. The second aspect, of data integrity is related to data access and privacy problems. This is particularly important in a relational database where relations are dynamically created and deleted. The third and fourth aspects deal with data sharing and data recovery after a system failure. They are directly related to the notion of an atomic transaction.
Integrity constraints are predicates which must be true for data stored in the database. We can classify the Integrity Constraints (ICs) either as static or dynamic; A static constraint is concerned by a database state, for instance, all the salary values must be between 2000 and 20000. A dynamic constraint is concerned by the transition between states, for instance, a salary cannot decrease. Relational languages offer some interesting features to express all these kinds of constraints.
It is not possible to declare all the functions on an entity, or all of the subtypes, at the time of module creation. In P-FDM, the parser will accept extra declarations prefixed by the header extend, module name. Once these declarations have been made, it is possible to load values to populate the new functions on the existing instances or new subtypes of these instances. One can also create new instances of newly declared types, and populate functions from existing entities into these types.
This is the advantage of using a storage system like binary relations or triples which does not keep all the funciton values on an entity together. It is also a very important feature in practice. Users often want to add new data or relationships onto existing data, and they do not wish to be forced to reload the entire database, as required by early network systems. For example, a scientist may wish to add to a protein database details of the seientific paper or papers where the measurements on the protein were first reported. It is only necessary then to declare a new entity type paper with a suitable key, and then to declare a new multi-valued function on proteins to reference such entities. The function will initially have null values on existing proteins, until populated by use of the load utility or using explicit update statements.
Modularization of Procedures
The module scheme outlined above provides for a simple form of concurrent sharing of date between multiple readers, each accessing a number of shared modules, and also using extra data stored in their own private and temporary module^. However, it does not provide one of the common features of modules in programming, namely the hiding of names and the partitioning of the same space.
Modules of the kind used in ADA and Modula-2 allow one to declare a collection of procedures and functions that reference each other, but to hide the names of some of these functions from other modules by not putting them on an export list. Likewise certain record type declarations can be hidden, which effectively makes the records concerned meaningless to an outside program, and provides a form of security. Finally, one can use a scoping mechanism, so that a locally defined function over rides a globally defined function of the same name that would otherwise be imported from another module.
In the following example, an outline is, given; of an ADA package for working with sorted lists: It says with procedure names are available for outsiders. Other procedures maybe declared, inside the package body, but neither they nor any local variables in that body are visible to outsiders.
The package signature also declares the types of exported procedure parameters. If one is of type Limited private then its constructor and selectors are known only to the package, and all a user does is to take instances abated by some package procedures and pass them as parameters to other package procedures. This is clearly a form of encapsulation.
Where a package parameter is declared following the keyword generic, the package is incomplete and the name of the actual type must be supplied at, run-time, when specific instantiations of package procedures will be created which will work only on this type specified by the user. This is clearly useful for a list sorting package, which might be used to sort lists of strings, or of integers, etc. A more modern language such a ML (Milner et al 1989) actually allows package parameters to be whole signatures, including characteristic functions on a type. For example, an ordering function L e q (X, Y) is clearly useful when sorting.
Information hiding, particularly of the names of function types of entity types, is traditionally performed in databases by subschema definitions, which also have passwords associated with them, and provide a view on a collection of modules for a particular application program. It would be relatively straight forward to add such a feature to P/FDM. It would need to check various features, in particular that if an entity type was hidden, then any function producing a result of that type was also hidden.
The scoping of function names is more awkward. It would in theory be possible to have two different descriptor referring to the same function on the same entity type but having their definitions held in different modules. Currently a function call in P/FD has to look up the descriptor, so it can check, the module location and then choose that function definition which is in the same module as the method currently being executed. A scheme similar to this has been implemented by (Moffat and Gray, 1986) in Persistent Prolog (Perlog).
However, such a scheme is not easy for a user to understand, and it is probably Simpler to prohibit the storage of functions of the same name on the same entity type in different modules, by checking descriptors of all modules, when a new function is declared. The same prohibition applies to entity class names, it is highly undesirable to have two entity classes of the same name declared in different modules. Thus, if data exported from another database is to he loaded in, but has a cash with existing names, then the names will need to be systematically, changed, either by editing before loading the data, or by using a system, supported renaming mechanism as in Perlog (Moffat and Gray, 1988).
This discussion illustrates a difference in philosophy between the database approach and that of object oriented programming. The database approach is to have a central conceptual schema containing all the entity and function names, with all entity and function. Entity combinations having distinct names. The object-oriented approach is to hide methods so completely that details of other methods called and object classes accessed during method execution are not available. Basically databases are about information sharing whereas object orientation is partly about information hiding, and we need a comprise between these two aims. The compromise solution proposed here favors information sharing but a more extensive use of procedures and methods will require a more schema for name hiding.