Data exchange in the presence of arithmetic comparisons

Data exchange is the problem of transforming data structured under a schema (called source) into data structured under a different schema (called target). The emphasis of data exchange is to materialize a target instance (called solution) that satisfies the relationship between the schemas. Universal solutions were shown to be the most suitable solutions, mainly because they can be used to answer conjunctive queries posed over the target schema. Trying to extend this result to more expressive query languages fails, even if we only add inequalities (≠) to conjunctive queries. In this work we study data exchange in the presence of general arithmetic comparisons (<, ≤, >, ≥, =, ≠): (a) We consider queries posed over the target schema that belong to the class of unions of conjunctive queries with arithmetic comparisons (in short CQACs). (b) We exploit arithmetic comparisons to define more expressive data exchange settings, called DEAC settings. In particular, DEAC settings consist of constraints that involve arithmetic comparisons. For that, two new classes of dependencies (tgd-ACs and acgds) are introduced, to capture the need of arithmetic comparisons in source-to-target and target constraints. We show that in DEAC settings the existence of solution problem is in NP. We define a novel chase procedure called AC-chase which is a tree and we prove that it produces a universal solution (appropriately defined to deal with arithmetic comparisons). We show that the new concept of universal solution is the right tool for query answering in the case of unions of CQACs. The complexity of computing certain answers for unions of CQACs is shown to be coNP-complete. Moreover, we identify polynomial cases for a) computing a universal solution and b) computing certain answers. For that, we introduce the succinct AC-chase which is a sequence instead of a tree, but its result is not necessarily a solution. We identify cases where succinct AC-chase returns indeed a universal solution and we investigate the syntactic conditions of the query under which query answering takes polynomial time. We show that the latter is feasible even in cases where the result of chase is not a universal solution.

[1]  Leonid Libkin,et al.  Data exchange and incomplete information , 2006, PODS '06.

[2]  Aleksander Madry,et al.  Data exchange: On the complexity of answering queries with inequalities , 2005, Inf. Process. Lett..

[3]  Vincent Y. Lum,et al.  EXPRESS: a data EXtraction, Processing, and Restructuring System , 1977, TODS.

[4]  Michel de Rougemont,et al.  Approximate Data Exchange , 2007, ICDT.

[5]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[6]  Ashish Gupta,et al.  Partial information based on integrity constraint checking , 1995 .

[7]  Marcelo Arenas,et al.  XML data exchange: consistency and query answering , 2005, PODS '05.

[8]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[9]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[10]  Catriel Beeri,et al.  Formal Systems for Tuple and Equality Generating Dependencies , 1984, SIAM J. Comput..

[11]  Phokion G. Kolaitis,et al.  The complexity of data exchange , 2006, PODS '06.

[12]  Chen Li,et al.  On Containment of Conjunctive Queries with Arithmetic Comparisons , 2004, EDBT.

[13]  Anthony C. Klug On conjunctive queries containing inequalities , 1988, JACM.

[14]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[15]  Philip A. Bernstein,et al.  Generic Model Management: A Database Infrastructure for Schema Manipulation , 2001, CoopIS.

[16]  Ron van der Meyden The Complexity of Querying Indefinite Data about Linearly Ordered Domains , 1997, J. Comput. Syst. Sci..

[17]  Catriel Beeri,et al.  A Proof Procedure for Data Dependencies , 1984, JACM.

[18]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[19]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[20]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[21]  Michael J. Maher,et al.  Chasing constrained tuple-generating dependencies , 1996, PODS.

[22]  Georg Gottlob,et al.  Data exchange: computing cores in polynomial time , 2006, PODS '06.

[23]  Gösta Grahne,et al.  The Problem of Incomplete Information in Relational Databases , 1991, Lecture Notes in Computer Science.

[24]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Z. Meral Özsoyoglu,et al.  On the Maintenance of Implication Integrity Constraints , 1993, DEXA.

[26]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[27]  Jeffrey D. Ullman,et al.  Answering Queries Using Limited External Query Processors , 1999, J. Comput. Syst. Sci..

[28]  Jeffrey D. Ullman,et al.  Answering queries using limited external query processors (extended abstract) , 1996, PODS.

[29]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[30]  Georg Gottlob,et al.  Computing cores for data exchange: new algorithms and practical solutions , 2005, PODS '05.

[31]  Michael J. Maher Constrained Dependencies , 1995, Theor. Comput. Sci..

[32]  Alin Deutsch,et al.  Reformulation of XML Queries and Constraints , 2003, ICDT.

[33]  Michael J. Maher,et al.  Reasoning with Disjunctive Constrained Tuple-Generating Dependencies , 2001, DEXA.

[34]  Ronald Fagin,et al.  Locally consistent transformations and query answering in data exchange , 2004, PODS '04.

[35]  Chen Li,et al.  Rewriting queries using views in the presence of arithmetic comparisons , 2006, Theor. Comput. Sci..

[36]  Marianne Baudinet,et al.  Constraint-Generating Dependencies , 1994, PPCP.

[37]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.

[38]  Ron van der Meyden,et al.  Logical Approaches to Incomplete Information: A Survey , 1998, Logics for Databases and Information Systems.