Inverting Schema Mappings: Bridging the Gap between Theory and Practice

The inversion of schema mappings has been identified as one of the fundamental operators for the development of a general framework for metadata management. In fact, during the last years three alternative notions of inversion for schema mappings have been proposed (Fagin-inverse [10], quasi-inverse [14] and maximum recovery [2]). However, the procedures that have been developed for computing these operators have some features that limit their practical applicability. First, these algorithms work in exponential time and produce inverse mappings of exponential size. Second, these algorithms express inverses in some mappings languages which include features that are difficult to use in practice. A typical example is the use of disjunction in the conclusion of the mapping rules, which makes the process of exchanging data much more complicated. In this paper, we propose solutions for the two problems mentioned above. First, we provide a polynomial time algorithm that computes the three inverse operators mentioned above given a mapping specified by a set of tuple-generating dependencies (tgds). This algorithm uses an output mapping language that can express these three operators in a compact way and, in fact, can compute inverses for a much larger class of mappings. Unfortunately, it has already been proved that this type of mapping languages has to include some features that are difficult to use in practice and, hence, this is also the case for our output mapping language. Thus, as our second contribution, we propose a new and natural notion of inversion that overcomes this limitation. In particular, every mapping specified by a set of tgds admits an inverse under this new notion that can be expressed in a mapping language that slightly extends tgds, and that has the same good properties for data exchange as tgds. Finally, as our last contribution, we provide an algorithm for computing such inverses.

[1]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[2]  Ronald Fagin,et al.  Locally consistent transformations and query answering in data exchange , 2004, PODS '04.

[3]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS '04.

[4]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[5]  Maurizio Lenzerini,et al.  On reconciling data exchange, data integration, and peer data management , 2007, PODS '07.

[6]  Ronald Fagin,et al.  Towards a theory of schema-mapping optimization , 2008, PODS.

[7]  Erhard Rahm,et al.  Supporting executable mappings in model management , 2005, SIGMOD '05.

[8]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[9]  Phokion G. Kolaitis,et al.  Structural characterizations of schema-mapping languages , 2009, ICDT '09.

[10]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[11]  Alin Deutsch,et al.  Optimization Properties for Classes of Conjunctive Regular Path Queries , 2001, DBPL.

[12]  Paolo Papotti,et al.  Nested mappings: schema mapping reloaded , 2006, VLDB.

[13]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[14]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[15]  Ronald Fagin Inverting schema mappings , 2007 .

[16]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[17]  Andrew B. Whinston,et al.  Model management , 1994 .

[18]  Philip A. Bernstein,et al.  Model management 2.0: manipulating richer mappings , 2007, SIGMOD '07.

[19]  Jaroslav Nesetril,et al.  Graphs and homomorphisms , 2004, Oxford lecture series in mathematics and its applications.

[20]  Sergey Melnik,et al.  Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science) , 2004 .

[21]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[22]  Marcelo Arenas,et al.  The recovery of a schema mapping: bringing exchanged data back , 2008, TODS.

[23]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[24]  Sergey Melnik,et al.  Generic Model Management , 2004, Lecture Notes in Computer Science.

[25]  Ronald Fagin,et al.  Quasi-inverses of schema mappings , 2007, PODS '07.

[26]  Dan Suciu,et al.  The Piazza peer data management system , 2004, IEEE Transactions on Knowledge and Data Engineering.