Data integration through database federation

In a large modern enterprise, it is almost inevitable that different parts of the organization will use different systems to produce, store, and search their critical data. Yet, it is only by combining the information from these various systems that the enterprise can realize the full value of the data they contain. Database federation is one approach to data integration in which middleware, consisting of a relational database management system, provides uniform access to a number of heterogeneous data sources. In this paper, we describe the basics of database federation, introduce several styles of database federation, and outline the conditions under which each style of federation should be used. We discuss the benefits of an information integration solution based on database technology, and we demonstrate the utility of the database federation approach through a number of usage scenarios involving IBM's DB2 product.

[1]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[2]  Michael Stonebraker,et al.  Implementation of integrity constraints and views by query modification , 1975, SIGMOD '75.

[3]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[4]  C. J. Date,et al.  A Guide to SQL Standard, 4th Edition , 1997 .

[5]  C. J. Date A guide to the SQL standard (2nd ed.) , 1989 .

[6]  日本IBMシステムズエンジニアリング株式会社 WebSphere Application Server 開発者ガイド , 2001 .

[7]  Laura M. Haas,et al.  An Architecture for Transparent Access to Diverse Data Sources , 2001, Compontent Database Systems.

[8]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[9]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[10]  W. Gropp,et al.  Accepted for publication , 2001 .

[11]  E. F. Codd,et al.  The Relational Model for Database Management, Version 2 , 1990 .

[12]  Donald D. Chamberlin,et al.  A Complete Guide to DB2 Universal Database , 1998 .

[13]  Jennifer Widom,et al.  Integrating and Accessing Heterogeneous Information Sources in TSIMMIS , 1994 .

[14]  Mary Roth,et al.  Information integration: A new generation of information technology , 2002, IBM Syst. J..

[15]  Hamid Pirahesh,et al.  Heterogeneous query processing through SQL table functions , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[16]  Irving L. Traiger,et al.  Transactions and consistency in distributed database systems , 1982, TODS.

[17]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[18]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[19]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[20]  Michael Breu,et al.  Digital Libraries in Computer Science: The MeDoc Approach , 1998, Lecture Notes in Computer Science.

[21]  Ming-Chien Shan Pegasus architecture and design principles , 1993, SIGMOD '93.

[22]  Luis Gravano,et al.  The Stanford InfoBus and Its Service Layers: Augmenting the Internet with High-Level Information Management Protocols , 1998, The MeDoc Approach.

[23]  C. Mohan,et al.  Caching Technologies for Web Applications , 2001, VLDB.

[24]  Hamid Pirahesh,et al.  Extensible query processing in starburst , 1989, SIGMOD '89.

[25]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[26]  E. Kaltofen,et al.  Cited References , 2003 .

[27]  Laura M. Haas,et al.  Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System , 1999, VLDB.

[28]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[29]  Jeffrey F. Naughton,et al.  Middle-tier database caching for e-business , 2002, SIGMOD '02.

[30]  Hamid Pirahesh,et al.  Answering complex SQL queries using automatic summary tables , 2000, SIGMOD '00.

[31]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[32]  Frank Leymann,et al.  Using flows in information integration , 2002, IBM Syst. J..

[33]  Laura M. Haas,et al.  Garlic: a new flavor of federated query processing for DB2 , 2002, SIGMOD '02.