The structure of inverses in schema mappings

A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). The notion of an inverse of a schema mapping is subtle, because a schema mapping may associate many target instances with each source instance, and many source instances with each target instance. In PODS 2006, Fagin defined a notion of the inverse of a schema mapping. This notion is tailored to the types of schema mappings that commonly arise in practice (those specified by “source-to-target tuple-generating dependencies”, or s-t tgds). We resolve the key open problem of the complexity of deciding whether there is an inverse. We also explore a number of interesting questions, including: What is the structure of an inverse? When is the inverse unique? How many nonequivalent inverses can there be? When does an inverse have an inverse? How big must an inverse be? Surprisingly, these questions are all interrelated. We show that for schema mappings M specified by full s-t tgds (those with no existential quantifiers), if M has an inverse, then it has a polynomial-size inverse of a particularly nice form, and there is a polynomial-time algorithm for generating it. We introduce the notion of “essential conjunctions” (or “essential atoms” in the full case), and show that they play a crucial role in the study of inverses. We use them to give greatly simplified proofs of some known results about inverses. What emerges is a much deeper understanding about this fundamental and complex operator.

[1]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS 2004.

[2]  Sergey Melnik,et al.  Generic Model Management , 2004, Lecture Notes in Computer Science.

[3]  Ronald Fagin,et al.  Quasi-inverses of schema mappings , 2007, PODS '07.

[4]  Philip A. Bernstein,et al.  Model management 2.0: manipulating richer mappings , 2007, SIGMOD '07.

[5]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[6]  Philip A. Bernstein,et al.  Composition of mappings given by embedded dependencies , 2005, PODS '05.

[7]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[8]  Sergey Melnik,et al.  Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science) , 2004 .

[9]  Ronald Fagin Representation theory for a class of denumerable Markov chains , 1968 .

[10]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS '04.

[11]  Ronald Fagin,et al.  Inverting schema mappings , 2006, TODS.

[12]  Giovanni Mastrobuoni,et al.  Preliminary Version , 1994 .

[13]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[14]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[15]  Ronald Fagin,et al.  Reverse data exchange: Coping with nulls , 2009, TODS.

[16]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.

[17]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[18]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[19]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.