The language of plain SO-tgds: Composition, inversion and structural properties

The problems of composing and inverting schema mappings specified by source-to-target tuple-generating dependencies (st-tgds) have attracted a lot of attention, as they are of fundamental importance for the development of [email protected]?s metadata management framework. In the case of the composition operator, a natural semantics has been proposed and the language of second-order tuple generating dependencies (SO-tgds) has been identified as the right language to express it. In the case of the inverse operator, several semantics have been proposed, most notably the maximum recovery, the only inverse notion that guarantees that every mapping specified by st-tgds is invertible. Unfortunately, less attention has been paid to combining both operators, which is the motivation of this paper. More precisely, we start our investigation by showing that SO-tgds are not good for inversion, as there exist mappings specified by SO-tgds that are not invertible under any of the notions of inversion proposed in the literature. To overcome this limitation, we borrow the notion of CQ-composition, which is a relaxation obtained by parameterizing the composition of mappings by the class of conjunctive queries (CQ), and we propose a restriction over the class of SO-tgds that gives rise to the language of plain SO-tgds. Then we show that plain SO-tgds are the right language to express the CQ-composition of mappings given by st-tgds, in the same sense that SO-tgds are the right language to express the composition of st-tgds, and we prove that every mapping specified by a plain SO-tgd admits a maximum recovery, thus showing that plain SO-tgds have a good behavior w.r.t. inversion. Moreover, we show that the language of plain SO-tgds shares some fundamental structural properties with the language of st-tgds, but being much more expressive, and we provide a polynomial-time algorithm to compute maximum recoveries for mappings specified by plain SO-tgds (which can also be used to compute maximum recoveries for mappings given by st-tgds). All these results suggest that the language of plain SO-tgds is a good alternative to be implemented in data exchange and data integration applications.

[1]  Marcelo Arenas,et al.  Inverting Schema Mappings: Bridging the Gap between Theory and Practice , 2009, Proc. VLDB Endow..

[2]  Philip A. Bernstein,et al.  Composition of mappings given by embedded dependencies , 2005, PODS '05.

[3]  Ronald Fagin,et al.  The structure of inverses in schema mappings , 2010, JACM.

[4]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS '04.

[5]  Ronald Fagin,et al.  Inverting schema mappings , 2006, TODS.

[6]  Marcelo Arenas,et al.  Data exchange beyond complete data , 2011, PODS.

[7]  Sergey Melnik,et al.  Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science) , 2004 .

[8]  Philip A. Bernstein,et al.  Model management 2.0: manipulating richer mappings , 2007, SIGMOD '07.

[9]  Phokion G. Kolaitis,et al.  Structural characterizations of schema-mapping languages , 2009, ICDT '09.

[10]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[11]  Ronald Fagin,et al.  Reverse data exchange: Coping with nulls , 2009, TODS.

[12]  Paolo Papotti,et al.  Nested mappings: schema mapping reloaded , 2006, VLDB.

[13]  Leonid Libkin,et al.  Elements Of Finite Model Theory (Texts in Theoretical Computer Science. An Eatcs Series) , 2004 .

[14]  Leonid Libkin,et al.  Elements of Finite Model Theory , 2004, Texts in Theoretical Computer Science.

[15]  Ronald Fagin,et al.  Composition with target constraints , 2010, ICDT '10.

[17]  Sergey Melnik,et al.  Generic Model Management , 2004, Lecture Notes in Computer Science.

[18]  Marcelo Arenas,et al.  Composition and inversion of schema mappings , 2009, SGMD.

[19]  Ronald Fagin,et al.  Quasi-inverses of schema mappings , 2007, PODS '07.

[20]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[21]  Ronald Fagin,et al.  Local transformations and conjunctive-query equivalence , 2012, PODS '12.

[22]  Herbert B. Enderton,et al.  A mathematical introduction to logic , 1972 .

[23]  Ronald Fagin,et al.  Towards a theory of schema-mapping optimization , 2008, PODS.

[24]  Erhard Rahm,et al.  Supporting executable mappings in model management , 2005, SIGMOD '05.

[25]  Anuj Dawar,et al.  A Restricted Second Order Logic for Finite Structures , 1994, LCC.

[26]  M. Panella Associate Editor of the Journal of Computer and System Sciences , 2014 .

[27]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[28]  Emanuel Sallinger,et al.  On the Undecidability of the Equivalence of Second-Order Tuple Generating Dependencies , 2015, AMW.

[29]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.