Composing schema mappings: second-order dependencies to the rescue

A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). A fundamental problem is composing schema mappings: given two successive schema mappings, derive a schema mapping between the source schema of the first and the target schema of the second that has the same effect as applying successively the two schema mappings.In this article, we give a rigorous semantics to the composition of schema mappings and investigate the definability and computational complexity of the composition of two schema mappings. We first study the important case of schema mappings in which the specification is given by a finite set of source-to-target tuple-generating dependencies (source-to-target tgds). We show that the composition of a finite set of full source-to-target tgds with a finite set of tgds is always definable by a finite set of source-to-target tgds, but the composition of a finite set of source-to-target tgds with a finite set of full source-to-target tgds may not be definable by any set (finite or infinite) of source-to-target tgds; furthermore, it may not be definable by any formula of least fixed-point logic, and the associated composition query may be NP-complete. After this, we introduce a class of existential second-order formulas with function symbols and equalities, which we call second-order tgds, and make a case that they are the “right” language for composing schema mappings. Specifically, we show that second-order tgds form the smallest class (up to logical equivalence) that contains every source-to-target tgd and is closed under conjunction and composition. Allowing equalities in second-order tgds turns out to be of the essence, even though the “obvious” way to define second-order tgds does not require equalities. We show that second-order tgds without equalities are not sufficiently expressive to define the composition of finite sets of source-to-target tgds. Finally, we show that second-order tgds possess good properties for data exchange and query answering: the chase procedure can be extended to second-order tgds so that it produces polynomial-time computable universal solutions in data exchange settings specified by second-order tgds.

[1]  Ronald Fagin Generalized first-order spectra, and polynomial. time recognizable sets , 1974 .

[2]  David S. Johnson,et al.  Some Simplified NP-Complete Graph Problems , 1976, Theor. Comput. Sci..

[3]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[4]  David Harel,et al.  Structure and Complexity of Relational Queries , 1980, FOCS.

[5]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[6]  Catriel Beeri,et al.  A Proof Procedure for Data Dependencies , 1984, JACM.

[7]  Catriel Beeri,et al.  Formal Systems for Tuple and Equality Generating Dependencies , 1984, SIAM J. Comput..

[8]  John Wylie Lloyd,et al.  Foundations of Logic Programming , 1987, Symbolic Computation.

[9]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[10]  Tomás Feder,et al.  The Computational Structure of Monotone Monadic SNP and Constraint Satisfaction: A Study through Datalog and Group Theory , 1999, SIAM J. Comput..

[11]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[12]  Anuj Dawar A Restricted Second Order Logic for Finite Structures , 1998, Inf. Comput..

[13]  Neil Immerman,et al.  Descriptive Complexity , 1999, Graduate Texts in Computer Science.

[14]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[15]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[16]  Panos Vassiliadis,et al.  On the Logical Modeling of ETL Processes , 2002, CAiSE.

[17]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[18]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[19]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[20]  Data exchange: getting to the core , 2003, PODS '03.

[21]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.

[22]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS 2004.

[23]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2005, Theor. Comput. Sci..

[24]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[25]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..