Managing semantic heterogeneity in databases: a theoretical prospective

Modern database management systems essentially solve the problem of accessing and managing large volumes of related data on a single platform, or on a cluster of tightly-coupled platforms. But many problems remain when two or more databases need to work together. A fundamental problem is raised by semantic heterogeneity the fact that data duplicated across multiple databases is represented differently in the underlying database schemas. This tutorial describes fundamental problems raised by semantic heterogeneity and surveys theoretical frameworks that can provide solutions for them. The tutorial considers the following topics: (1) representative architectures for supporting database interoperation; (2) notions for comparing the “information capacity” of database schemas; (3) providing support for read-only integrated views of data, including the .virtual and materialized approaches; (4) providing support for read-write integrated views of data, including the issue of workflows on heterogeneous databases; and (5) research and tools for accessing and effectively using meta-data, e.g., to identify the relationships between schemas of different databases.

[1]  Peter F. Patel-Schneider,et al.  The DARPA Knowledge Sharing Effort: A Progress Report , 1997, KR.

[2]  Doug Stacey Replication: DB2, Oracle, or Sybase? , 1995, SGMD.

[3]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[4]  Amit P. Sheth,et al.  Using Flexible Transactions to Support Multi-System Telecommunication Applications , 1992, VLDB.

[5]  H. V. Jagadish,et al.  Data Integration using Self-Maintainable Views , 1996, EDBT.

[6]  Gail E. Kaiser,et al.  Cooperative Transactions for Multiuser Environments , 1995, Modern Database Systems.

[7]  Gottfried Vossen,et al.  Reflective Programming in the Relational Algebra , 1996, J. Comput. Syst. Sci..

[8]  Bharat Bhasker Query processing in heterogeneous distributed database management systems , 1992 .

[9]  Gio Wiederhold Intelligent Integration of Information - Foreword , 1996, J. Intell. Inf. Syst..

[10]  Paolo Atzeni,et al.  A metamodel approach for the management of multiple models and translation of schemes , 1993, Inf. Syst..

[11]  Deborah L. McGuinness,et al.  CLASSIC: a structural data model for objects , 1989, SIGMOD '89.

[12]  David S. Johnson,et al.  Testing containment of conjunctive queries under functional and inclusion dependencies , 1982, J. Comput. Syst. Sci..

[13]  Carl A. Gunter,et al.  The Mixed Powerdomain , 2008 .

[14]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[15]  Jennifer Widom,et al.  Managing Semantic Heterogeneity with Production Rules and Persistent Queues , 1993, VLDB.

[16]  Kenneth A. Ross Relations with relation names as arguments: algebra and calculus , 1992, PODS '92.

[17]  Serge Abiteboul,et al.  Objects and views , 1991, SIGMOD '91.

[18]  Clement T. Yu,et al.  Query Processing in Multidatabase Systems , 1995, Modern Database Systems.

[19]  Nick Roussopoulos,et al.  Interoperability of multiple autonomous databases , 1990, CSUR.

[20]  Abraham Silberschatz,et al.  Database System Concepts, 3rd Edition , 1991 .

[21]  Serge Abiteboul,et al.  Restructuring Hierarchical Database Objects , 1988, Theor. Comput. Sci..

[22]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[23]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[24]  Panos K. Chrysanthis,et al.  ACTA: a framework for specifying and reasoning about transaction structure and behavior , 1990, SIGMOD '90.

[25]  Abraham Silberschatz,et al.  Transaction Management in Multidatabase Systems , 1995, Modern Database Systems.

[26]  Jeffrey D. Ullman,et al.  Answering Queries Using Limited External Query Processors , 1999, J. Comput. Syst. Sci..

[27]  Carlo Batini,et al.  Inclusion and Equivalence between Relational Database Schemata , 1982, Theor. Comput. Sci..

[28]  Jeffrey D. Ullman,et al.  Answering queries using limited external query processors (extended abstract) , 1996, PODS.

[29]  Dennis McLeod,et al.  Remote-Exchange: an approach to controlled sharing among autonomous, heterogeneous database systems , 1991, COMPCON Spring '91 Digest of Papers.

[30]  C. J. Date An Introduction to Database Systems, 6th Edition , 1995 .

[31]  Craig W. Thompson The Changing Database Standards Landscape , 1995, Modern Database Systems.

[32]  Amar Gupta,et al.  Integration of Information Systems: Bridging Heterogeneous Databases , 1989 .

[33]  Yuri Breitbart,et al.  Database integration in a distributed heterogeneous database system , 1986, 1986 IEEE Second International Conference on Data Engineering.

[34]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[35]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[36]  Richard Hull,et al.  Structures for manipulating proposed updates in object-oriented databases , 1996, SIGMOD '96.

[37]  Calton Pu,et al.  Split-Transactions for Open-Ended Activities , 1988, VLDB.

[38]  Dennis McLeod,et al.  An Approach to Resolving Semantic Heterogenity in a Federation of Autonomous, Heterogeneous Database Systems , 1993, Int. J. Cooperative Inf. Syst..

[39]  Gerhard Weikum,et al.  The Mentor project: steps towards enterprise-wide workflow management , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[40]  Limsoon Wong,et al.  A Data Transformation System for Biological Data Sources , 1995, VLDB.

[41]  Thomas W. Reps,et al.  A categorized bibliography on incremental computation , 1993, POPL '93.

[42]  Walid G. Aref,et al.  Spatial Data Models and Query Processing , 1995, Modern Database Systems.

[43]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[44]  Ming-Chien Shan,et al.  Object Identification in Multidatabase Systems , 1992, DS-5.

[45]  Terry Winograd,et al.  The action workflow approach to workflow management technology , 1992, CSCW '92.

[46]  Stephen Fox,et al.  Heterogeneous distributed database systems for production use , 1990, CSUR.

[47]  Masatoshi Yoshikawa,et al.  ILOG: Declarative Creation and Manipulation of Object Identifiers , 1990, VLDB.

[48]  Michael R. Genesereth,et al.  Software agents , 1994, CACM.

[49]  Seymour Ginsburg,et al.  Properties of functional-dependency families , 1982, JACM.

[50]  Aaron Watters,et al.  A Semantics for Complex Objects and Approximate Answers , 1991, J. Comput. Syst. Sci..

[51]  C. J. Date A guide to the SQL standard (2nd ed.) , 1989 .

[52]  Gerhard Weikum,et al.  A Formal Foundation for Distributed Workflow Execution Based on State Charts , 1997, ICDT.

[53]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[54]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[55]  Victor Vianu,et al.  Procedural Languages for Database Queries and Updates , 1990, J. Comput. Syst. Sci..

[56]  Richard Hull,et al.  Using witness generators to support bi-directional update between object-based databases (extended abstract) , 1995, PODS.

[57]  Yuri Breitbart,et al.  Multidatabase Interoperability , 1990, SGMD.

[58]  Serge Abiteboul,et al.  Procedural and declarative database update languages , 1988, PODS '88.

[59]  Latha S. Colby,et al.  Algorithms for deferred view maintenance , 1996, SIGMOD '96.

[60]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[61]  Laks V. S. Lakshmanan,et al.  Tables as a paradigm for querying and restructuring (extended abstract) , 1996, PODS '96.

[62]  Hector Garcia-Molina,et al.  Template-based wrappers in the TSIMMIS system , 1997, SIGMOD '97.

[63]  Richard Hull,et al.  A Specificational Approach to Merging Persistent Object Bases , 1990, POS.

[64]  Umeshwar Dayal,et al.  View Definition and Generalization for Database Integration in a Multidatabase System , 1984, IEEE Transactions on Software Engineering.

[65]  Craig A. Knoblock,et al.  Retrieving and Integrating Data from Multiple Information Sources , 1993, Int. J. Cooperative Inf. Syst..

[66]  Yannis E. Ioannidis,et al.  Conjunctive Query Equivalence of Keyed Relational Schemas. , 1997, PODS 1997.

[67]  Eugene Wong,et al.  Query processing in a system for distributed databases (SDD-1) , 1981, TODS.

[68]  Renée J. Miller,et al.  The Use of Information Capacity in Schema Integration and Translation , 1993, VLDB.

[69]  Won Kim,et al.  Modern Database Systems: The Object Model, Interoperability, and Beyond , 1995, Modern Database Systems.

[70]  Michael Kifer,et al.  HiLog: A First-Order Semantics for Higher-Order Logic Programming Constructs , 1989, NACLP.

[71]  Jennifer Widom,et al.  Deriving Production Rules for Incremental View Maintenance , 1991, VLDB.

[72]  Richard Fikes,et al.  The Ontolingua Server: a tool for collaborative ontology construction , 1997, Int. J. Hum. Comput. Stud..

[73]  Richard Fikes,et al.  Information Brokers: Gathering Information from Heterogeneous Information Sources , 1998 .

[74]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[75]  Leonid Libkin,et al.  Approximation in Databases , 1995, ICDT.

[76]  Wei-Min Shen,et al.  Using Carnot for enterprise information integration , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[77]  Michael Kifer,et al.  Querying object-oriented databases , 1992, SIGMOD '92.

[78]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[79]  Richard Hull Finitely Specifiable Implicational Dependency Families , 1984, JACM.

[80]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[81]  Timothy W. Finin,et al.  The Intelligent Database Interface: Integrating AI and Database Systems , 1990, AAAI.

[82]  C. J. Date A Guide to the SQL Standard , 1987 .

[83]  Jianwen Su,et al.  Algebraic and Calculus Query Languages for Recursively Typed Complex Objects , 1993, J. Comput. Syst. Sci..

[84]  Umeshwar Dayal,et al.  Queries and Views in an Object-Oriented Data Model , 1989, DBPL.

[85]  Shahram Ghandeharizadeh,et al.  Heraclitus: elevating deltas to be first-class citizens in a database programming language , 1996, TODS.

[86]  Gustavo Alonso,et al.  Letter from the Special Issue Editor , 1995, IEEE Data Eng. Bull..

[87]  Gio Wiederhold,et al.  Intelligent integration of information , 1993, SIGMOD Conference.

[88]  Amit P. Sheth,et al.  Task scheduling using intertask dependencies in Carnot , 1993, SIGMOD '93.

[89]  Michael Stonebraker,et al.  Mariposa: a wide-area distributed database system , 1996, The VLDB Journal.

[90]  Richard Hull,et al.  The Format Model: A Theory of database Organization , 1984, J. ACM.

[91]  Inderpal Singh Mumick,et al.  Incremental Maintenance Of Views With Duplicates , 1999 .

[92]  Raymond Reiter,et al.  Towards a Logical Reconstruction of Relational Database Theory , 1982, On Conceptual Modelling.

[93]  Inderpal Singh Mumick,et al.  Efficient Maintenance Of Materialized Mediated Views , 1999 .

[94]  Abraham Silberschatz,et al.  Unilateral commit: a new paradigm for reliable distributed transaction processing , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[95]  R. MacGregor,et al.  Mermaid—A front-end to distributed heterogeneous databases , 1987, Proceedings of the IEEE.

[96]  Nicolas Spyratos,et al.  Update semantics of relational views , 1981, TODS.

[97]  Arthur M. Keller,et al.  The Role of Semantics in Translating View Updates , 1986, Computer.

[98]  Richard Mark Soley,et al.  The OMG Object Model , 1995, Modern Database Systems.

[99]  W. H. Inmon,et al.  Rdb/VMS: Developing the Data Warehouse , 1993 .

[100]  Gail E. Kaiser,et al.  Concurrency control in advanced database applications , 1991, CSUR.

[101]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[102]  Dan Suciu,et al.  Query Decomposition and View Maintenance for Query Languages for Unstructured Data , 1996, VLDB.

[103]  Umeshwar Dayal,et al.  A Transactional Model for Long-Running Activities , 1991, VLDB.

[104]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[105]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[106]  Ming-Chien Shan,et al.  Issues in Operation Flow Management for Long-Running Acivities , 1993, IEEE Data Eng. Bull..

[107]  Michael R. Genesereth,et al.  Knowledge Interchange Format , 1991, KR.

[108]  Amit P. Sheth,et al.  Specifying interdatabase dependencies in a multidatabase environment , 1991, Computer.

[109]  Won Kim,et al.  On View Support in Object-Oriented Databases Systems , 1995, Modern Database Systems.

[110]  Dimitrios Georgakopoulos Transaction management in multidatabase systems , 1991 .

[111]  Amit P. Sheth,et al.  Specification and Execution of Transactional Workflows , 1995, Modern Database Systems.

[112]  Dennis McLeod,et al.  A federated architecture for information management , 1985, TOIS.

[113]  Kenneth A. Ross,et al.  Materialized view maintenance and integrity constraint checking: trading space for time , 1996, SIGMOD '96.

[114]  Arie Shoshani,et al.  Representing extended entity-relationship structures in relational databases: a modular approach , 1992, TODS.

[115]  Michael R. Genesereth,et al.  A Distributed and Anonymous Knowledge Sharing Approach to Software Interoperation , 1995, Int. J. Cooperative Inf. Syst..

[116]  Timothy W. Finin,et al.  Exotica: a Research Perspective on Workkow Management Systems. Data Engineering Bulletin, Special Issue on Infrastructure for Acknowledgements Special Thanks to 5.1 Updating Integrated Views 3 Issues in Data Representation 2.2 Architectures for Database Interoperation Managing Semantic Heterogeneity , 1997 .

[117]  Raghu Ramakrishnan,et al.  Conjunctive query equivalence of keyed relational schemas (extended abstract) , 1997, PODS '97.

[118]  Richard Hull Relative Information Capacity of Simple Relational Database Schemata , 1986, SIAM J. Comput..

[119]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[120]  Renée J. Miller,et al.  Schema equivalence in heterogeneous systems: bridging theory and practice , 1994, Inf. Syst..

[121]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[122]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[123]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[124]  Catriel Beeri,et al.  Equivalence of Relational Database Schemes , 1981, SIAM J. Comput..

[125]  Philip A. Bernstein,et al.  Implementing recoverable requests using queues , 1990, SIGMOD '90.

[126]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[127]  Gang Zhou,et al.  A framework for supporting data integration using the materialized and virtual approaches , 1996, SIGMOD '96.