Data Integration under Integrity Constraints

Data integration systems provide access to a set of heteroge ne us, autonomous data sources through a so-called global schema. Th ere are basically two approaches for designing a data integration system. In t he global-centric approach, one defines the elements of the global schema as views ov r the sources, whereas in the local-centric approach, one characterizes t he ources as views over the global schema. It is well known that processing queries i n the latter approach is similar to query answering with incomplete information, and, therefore, is a complex task. On the other hand, it is a common opinion that qu ery processing is much easier in the former approach. In this paper we sho w t e surprising result that, when the global schema is expressed in the relat ional model with integrity constraints, even of simple types, the problem of in complete information implicitly arises, making query processing difficult in the global-centric approach as well. We then focus on global schemas with key and foreign k ey constraints, which represents a situation which is very common in practic e, and we illustrate techniques for effectively answering queries posed to the d ata integration system

[1]  Ron van der Meyden,et al.  Logical Approaches to Incomplete Information: A Survey , 1998, Logics for Databases and Information Systems.

[2]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[3]  Divesh Srivastava,et al.  The Information Manifold , 1995 .

[4]  Maurizio Lenzerini,et al.  Editorial: Introduction to: Data extraction, cleaning, and reconciliation a special issue of information systems, an international journal , 2001 .

[5]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[6]  David S. Johnson,et al.  Testing containment of conjunctive queries under functional and inclusion dependencies , 1982, J. Comput. Syst. Sci..

[7]  Edward Y. Chang,et al.  Query planning with limited source capabilities , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Nick Roussopoulos,et al.  Interoperability of multiple autonomous databases , 1990, CSUR.

[9]  Louiqa Raschid,et al.  Semantic query optimization for object databases , 1997, Proceedings 13th International Conference on Data Engineering.

[10]  Stéphane Bressan,et al.  Context Interchange: New Features and Formalisms for the Intelligent Integration of Information Context Interchange: New Features and Formalisms for the Intelligent Integration of Information , 1997 .

[11]  Jeffrey D. Ullman,et al.  Capability based mediation in TSIMMIS , 1998, SIGMOD '98.

[12]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[13]  Andrea Calì,et al.  Models for Information Integration: Turning Local-as-View Into Global-as-View , 2001 .

[14]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[15]  Maurizio Lenzerini,et al.  Interschema knowledge in cooperative information , 1993, [1993] Proceedings International Conference on Intelligent and Cooperative Information Systems.

[16]  Jarek Gryz,et al.  Query folding with inclusion dependencies , 1998, Proceedings 14th International Conference on Data Engineering.

[17]  Dan Suciu,et al.  Verifying Integrity Constraints on Web Sites , 1999, IJCAI.

[18]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[19]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[20]  Alberto O. Mendelzon,et al.  Tableau Techniques for Querying Information Sources through Global Schemas , 1999, ICDT.

[21]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[22]  Julius T. Tou,et al.  Information Systems , 1973, GI Jahrestagung.

[23]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[24]  Dan Suciu,et al.  Catching the boat with Strudel: experiences with a Web-site management system , 1998, SIGMOD '98.

[25]  Diego Calvanese,et al.  Query processing using views for regular path queries with inverse , 2000, PODS 2000.

[26]  John W. Lloyd,et al.  Partial Evaluation in Logic Programming , 1991, J. Log. Program..

[27]  Robert A. Kowalski,et al.  Integrity Checking in Deductive Databases , 1987, VLDB.

[28]  Diego Calvanese,et al.  Data Integration in Data Warehousing (Keynote Address) , 2001, CAiSE Workshops.

[29]  Stephen Fox,et al.  Heterogeneous distributed database systems for production use , 1990, CSUR.

[30]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[31]  Dennis Shasha,et al.  An extensible Framework for Data Cleaning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[32]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[33]  David Toman,et al.  Logics for Databases and Information Systems , 1998 .

[34]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[35]  Patrick Valduriez,et al.  Scaling Access to Heterogeneous Data Sources with DISCO , 1998, IEEE Trans. Knowl. Data Eng..