Data exchange: semantics and query answering

Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to the query answering problem in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem.We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. We show that a universal solution has no more and no less data than required for data exchange and that it represents the entire space of possible solutions. We then identify fairly general, yet practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of the "certain answers" in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also address other algorithmic issues that arise in data exchange. In particular, we study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution, and we explore the boundary of what queries can and cannot be answered this way, in a data exchange setting.

[1]  Ronald Fagin,et al.  Locally consistent transformations and query answering in data exchange , 2004, PODS '04.

[2]  Ron van der Meyden,et al.  Logical Approaches to Incomplete Information: A Survey , 1998, Logics for Databases and Information Systems.

[3]  G. Höfner,et al.  Data integration , 1993 .

[4]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[5]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[6]  Gunter Saake,et al.  Logics for databases and information systems , 1998 .

[7]  Stavros S. Cosmadakis,et al.  Functional and Inclusion Dependencies , 1986, Adv. Comput. Res..

[8]  Alin Deutsch,et al.  Reformulation of XML Queries and Constraints , 2003, ICDT.

[9]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[10]  David Maier,et al.  On the foundations of the universal relation model , 1984, TODS.

[11]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[12]  Masatoshi Yoshikawa,et al.  ILOG: Declarative Creation and Manipulation of Object Identifiers , 1990, VLDB.

[13]  Vincent Y. Lum,et al.  CONVERT: a high level translation definition language for data conversion , 1975, CACM.

[14]  Catriel Beeri,et al.  A Proof Procedure for Data Dependencies , 1984, JACM.

[15]  Ronald Fagin,et al.  Inclusion Dependencies and Their Interaction with Functional Dependencies , 1984, J. Comput. Syst. Sci..

[16]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[17]  Serge Abiteboul,et al.  Correspondence and translation for heterogeneous data , 1997, Theor. Comput. Sci..

[18]  Ronald Fagin,et al.  Data exchange: getting to the core , 2003, PODS '03.

[19]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[20]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[21]  Vincent Y. Lum,et al.  EXPRESS: a data EXtraction, Processing, and Restructuring System , 1977, TODS.

[22]  Ron van der Meyden The Complexity of Querying Indefinite Data about Linearly Ordered Domains , 1997, J. Comput. Syst. Sci..

[23]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[24]  Thomas J. Schaefer,et al.  The complexity of satisfiability problems , 1978, STOC.

[25]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[26]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[27]  Ronald Fagin,et al.  On Monadic NP vs. Monadic co-NP , 1995, Inf. Comput..

[28]  Jaroslav Nesetril,et al.  The core of a graph , 1992, Discret. Math..

[29]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[30]  Alin Deutsch,et al.  Optimization Properties for Classes of Conjunctive Regular Path Queries , 2001, DBPL.

[31]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[32]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[33]  Alon Y. Halevy,et al.  Recursive Query Plans for Data Integration , 2000, J. Log. Program..

[34]  Johann A. Makowsky,et al.  Why Horn Formulas Matter in Computer Science: Initial Structures and Generic Examples , 1987, J. Comput. Syst. Sci..

[35]  Stavros S. Cosmadakis,et al.  Functional and inclusion dependencies a graph theoretic approach , 1984, PODS '84.