Towards a theory of schema-mapping optimization

A schema mapping is a high-level specification that describes the relationship between two database schemas. As schema mappings constitute the essential building blocks of data exchange and data integration, an extensive investigation of the foundations of schema mappings has been carried out in recent years. Even though several different aspects of schema mappings have been explored in considerable depth, the study of schema-mapping optimization remains largely uncharted territory to date. In this paper, we lay the foundation for the development of a theory of schema-mapping optimization. Since schema mappings are constructs that live at the logical level of information integration systems, the first step is to introduce concepts and to develop techniques for transforming schema mappings to "equivalent" ones that are more manageable from the standpoint of data exchange or of some other data interoperability task. In turn, this has to start by introducing and studying suitable notions of "equivalence" between schema mappings. To this effect, we introduce the concept of data-exchange equivalence and the concept of conjunctive-query equivalence. These two concepts of equivalence are natural relaxations of the classical notion of logical equivalence; the first captures indistinguishability for data-exchange purposes, while the second captures indistinguishability for conjunctive-query-answering purposes. Moreover, they coincide with logical equivalence on schema mappings specified by source-to-target tuple-generating dependencies (s-t tgds), but differ on richer classes of dependencies, such as second-order tuple-generating dependencies (SO tgds) and sets of s-t tgds and target tuple-generating dependencies (target tgds). After exploring the basic properties of these three notions of equivalence between schema mappings, we focus on the following question: under what conditions is a schema mapping conjunctive-query equivalent to a schema mapping specified by a finite set of s-t tgds? We answer this question by obtaining complete characterizations for schema mappings that are specified by an SO tgd and for schema mappings that are specified by a finite set of s-t tgds and target tgds, and have terminating chase. These characterizations involve boundedness properties of the cores of universal solutions.

[1]  Philip A. Bernstein,et al.  Composition of mappings given by embedded dependencies , 2005, PODS '05.

[2]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[3]  Sergey Melnik,et al.  Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science) , 2004 .

[4]  Alin Deutsch,et al.  Reformulation of XML Queries and Constraints , 2003, ICDT.

[5]  Laura M. Haas,et al.  Clio grows up: from research prototype to industrial tool , 2005, SIGMOD '05.

[6]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.

[7]  Georg Gottlob,et al.  Computing cores for data exchange: new algorithms and practical solutions , 2005, PODS '05.

[8]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[9]  Philip A. Bernstein,et al.  Model management 2.0: manipulating richer mappings , 2007, SIGMOD '07.

[10]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[11]  Oded Shmueli,et al.  Equivalence of DATALOG Queries is Undecidable , 1993, J. Log. Program..

[12]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[13]  Philip A. Bernstein,et al.  Compiling mappings to bridge applications and databases , 2007, SIGMOD '07.

[14]  Marcelo Arenas,et al.  XML data exchange: consistency and query answering , 2005, PODS '05.

[15]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS '04.

[16]  Ronald Fagin,et al.  Inverting schema mappings , 2006, TODS.

[17]  Alin Deutsch,et al.  The chase revisited , 2008, PODS.

[18]  Sergey Melnik,et al.  Generic Model Management , 2004, Lecture Notes in Computer Science.

[19]  Ronald Fagin,et al.  Quasi-inverses of schema mappings , 2007, PODS '07.

[20]  Harry G. Mairson,et al.  Undecidable optimization problems for database logic programs , 1993, JACM.

[21]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[22]  Andrew B. Whinston,et al.  Model management , 1994 .

[23]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[24]  Georg Gottlob,et al.  Data exchange: computing cores in polynomial time , 2006, PODS '06.

[25]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[26]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[27]  W. C. Hilles,et al.  Data exchange. , 1976, Journal of medical education.