AP: Database Concepts
Databases are target collections of structured data stored in one or several computers in order to serve several data processing applications. A data base may be defined as a computerized collection of interrelated data stored such that:
- The data are shared among different users and applications, but a common and controlled approach is used for inserting, deleting, modifying and retrieving data.
- Terminal users and application programs which access the data do not need to be aware of the detailed storage structure.
A Database Management System (DBMS) is a software system capable 6f supporting and managing any number of independent databases.
The data in a database are usually logically organized according to some data model. Data models are central to database systems. They provide a conceptual basis for design, a formal bases for defining unambiguously the items of data and their interrelationships, and a framework for implementation. Database systems are generally based on one of three data models, namely:
- The relationship data model: This model is based on the mathematical motion of a relation. In this model both the data objects and their relationships are represented by two dimensional tables.
- The network data model: In this model the database is represented by a directed graph, the nodes of which represent the data objects (record types), and the arcs of which define the relationships among the data objects.
- The hierarchical data model: In this model the database is represented by tree structures. If the data are hot naturally hierarchical (as is usually the case) then this model imposes quite severe restrictions on the data modeler.
The main difficulty is that network and hierarchical systems provide only a minimal amount of data independence, whereby the logical database structure is buffered from the physical database organization. In contrast the relational specification of a logical data model is independent of implementation considerations.
Support for data independence is arguably the most important feature of a database system. The ability to separate the logical database definition from its physical storage organization increases the capabilities to redefine and restructure the database. That is, the storage structures and access strategies may be altered in response to changing requirements, without having to alter existing applications. This property of database systems is consistent with the modern approach to software engineering whereby large software systems are constructed from modular units which serve to hide, inessential details from the surrounding program environment.
These are in fact two distinct levels of data independence:
- Physical data independence which, as described above, insulates, applications from the underlying physical storage organization of the data.
- Logical data independence which insulates applications from changes made to the logical organization of data, e. g. The addition of new record type or relationships.
Logical data independence may be effected by-providing each application with its own view of the database. This view may simply be a subset of the overall logical data model or it may be a derived structure which is tailored to the application. In the latter case for example, record or field types may have their names changed or may be omitted, or multiple fields in the global model may be combined into a single field in the view. The implementation of such a view requires that it be mapped to the global data model.
One of the primary functions of a data base system is to maintain the integrity of the data base, i.e.. To preserve the consistency and correctness of the data. This is especially important in large multi-user environments in which the system must preserve integrity in the face of problems such as errors in updating programs, system software failures, hardware failures and the conflicting requirements of concurrently executing transactions.
The preservation of consistency and correctness is, of course, central to the software engineering approach to the construction and maintenance of large software systems. Thus many valuable concepts and techniques have been developed within the framework of software engineering which may be applied to the problem of maintaining integrity in data base systems.
Flexibility of access to data is an important requirement in modern data processing environments. It is commonly the case that data base systems are implemented which support only pre-defined access paths, i.e.. those paths which were foreseen as being necessary at the time of implementation. This is highly unsatisfactory for the user particularly in an environment where the data requirements are varying continuously. Thus a user may find that though the data that he requires is stored or can be derived from the stored data, the restrictive nature of the query processing software makes it difficult for him to retrieve it.
A great advantage of the relational model of data is that at the logical level it places no restrictions on the access paths that may be followed. Thus in a system which offers a high level relational-interface complex queries may be formulated with minimal effort. This of course places a great burden on the implementer to provide efficient query processing.
Optimization of Performance and efficiency
A data base system must offer a high standard of performance and efficiency, especially for on-line query processing. Early relational system was seriously deficient in this area, but great advances have been made in recent years. Improvements to performance may be effected at two levels: First by providing efficient data access, structures at the physical storage level, and second by restructuring users'queries into a form which is more amenable to efficient implementation. These two approaches are very closely interrelated, since clearly high level optimization techniques for evaluating queries must take account of the existing access paths and their properties.
A database system must provide mechanisms for the protection of data from unauthorized intrusion, whether accidental malicious. This is especially important if the database contains sensitive information, and for such systems the techniques which may be employed lo ensure confidentiality and security are quite complex (Date, 1983). However, there are some aspects of security which are relevant even in simple, single-user database systems. For example, protecting the database against accidental erasure or corruption, and using passwords to inhibit unauthorized access to database files.
Administration and Control
Database systems, in common with most software products, are not generally used by their designers or implementers, but rather by people who were not involved in the development of the system and who may have little or no knowledge of database technology. Also, a database system must respond to changes in its operating environment and to hanging user requirements. Thus, the operation and maintenance of a large, multi-user database system requires centralized administration and control, and extensive documentation.
The person or group responsible for supervising the day-to-day management of a large multi-user database system is often referred to as the Database Administrator. The primary duties of this administrator involve co-coordinating conflicting access requirements, monitoring performance, and supervising back-up and recovery services. For both user and administrator, many commercial database management systems now provide extensive Data Dictionary facilities. These are among the most essential data management tools, providing intelligence on data resources and database usage, and supporting data administration, system development, documentation and maintenance.