Interoperating database systems:issues and architectures

Information systems have been developed in recent years to meet particular local business needs. The information residing in those local systems represents a major business asset. However business needs change during the lifetime of an information system. The changes can arise either through the discovery of new business needs or through the restructuring of business departments. In many cases this leads to the need for several information systems to be used in a collaborative way. The outcome is that business has to contend with information systems which are distributed, autonorrwus and heterogeneous. The systems are distributed in the sense that they are physically residing on different computers, possibly separated by substantial distances. The systems are autonomous because they have been acquired and operated independently; indeed even when they are required to co-operate operational independence may still be required. The systems are heterogeneous because they have been designed separately, implemented on different hardware and software platforms, and used by operations staff with different world views. This report considers the architectures proposed to allow these distributed, autonomous, and heterogeneous systems to interoperate. Specifically we consider: 1. Distributed Database Systems: These systems are tightly coupled and assume a top-down design and implementation. The distributed database management systems are available as proprietary products. These systems are not appropriate where there is design autonomy and are not readily applicable to platform heterogeneity due to restrictions in the available gateways to different DBMSs. 2. Global Schema Multidatabases: These systems attempt to provide an overarching global schema for the set of databases. They are tightly coupled. The global schema itself may represent only a subset of the local schema definitions. A particular problem is lack of resilience to local changes to local schema. More generally the heterogeneity makes this architecture unsuitable for all but very simple cases. 3. Federated Database Systems: These systems attempt to provide interoperation by tailoring different shared elements of the schema to specific classes of user. The sharing can be either tightly coupled or loosely coupled. We then discuss the tightly and loosely coupled approaches to federation by considering in detail two important research projects, viz: 1. The IRO-DB project [BFHK94] as an example of tightly coupled federal architecture applied to the interoperation of object oriented and relational databases. 2. The MDSL project [LMR90] as an example of a loosely coupled federal architecture created using a multidatabase language. The specific problems associated with heterogeneities in schema and semantics are considered in a parallel report [HDR97]. In the early 196Os, database systems were proposed as a solution to the problem of shared access to heterogeneous data fdes created by multiple applications in a centrahsed environment. These data fdes were difficult to manage; they frequently contained duplications, inconsistencies, redundancies, and various types of heterogeneity at both structural and data levels. To overcome these problems, the autonomous fdes were replaced with a centrally defined database which was under the centrahsed control of a database management system (DBMS). The DBMS acted as a layer of abstraction between the centrally defined model of the relevant organisation’s data requirements, and the applications which used or manipulated the data. Today, many independent databases exist, particularly in large organisations andor those which may have undergone substantial commercial or structural changes such as mergers and take-overs. It is often the case that different business units within the same organisational context capture and store the same data, related data, or even the same data viewed from different perspectives. These databases often serve critical functions and embody significant resource investments within their business units. Thus, in many cases the preservation of the environments of these databases is essential, whilst on the other hand, as the information requirements and sophistication of the users evolve, a clear need to share or integrate data at a global level can be identified. The above scenario demonstrates the need to facilitate database interoperation within the context of an organisation. There is also a need to provide interoperation on an inter-organisational basis, particularly in the domain of public information systems. An example is databases specifically intended to market an organisation’s products or services to the public. Consider, for example, the case where airline companies provide public information systems documenting their schedules, ticket costs, flight departure airports and so on. In the absence of interoperation a prospective ticket buyer looking for the best deal would have to query each airline’s database separately, and then manually compare the individual results. What this situation requires is a facility to execute database queries which involve the access to data from more than one independent database system, or in short, queries which involve database interoperation. In this paper we discuss the major issues in the field of multidatabase systems. A taxonomy of interoperable database systems is presented along with a brief discussion of each of the solutions identified. Federated database systems, which offer the most promising solution to the problem of providing interoperation without sacrificing component database autonomy, are examined in some detail along with appropriate examples. Finally, we identlfy some of the problems which will form the basis of future research in this area. 2 ISSUES OF MULTIDATABASE SYSTEMS In this section we consider the three major issues which raise themselves in multidatabase systems, namely: distribution, autonomy and heterogeneity [SLgO]. We present these issues in