Schema mapping discovery from data instances

We introduce a theoretical framework for discovering relationships between two database instances over distinct and unknown schemata. This framework is grounded in the context of data exchange. We formalize the problem of understanding the relationship between two instances as that of obtaining a schema mapping so that a minimum repair of this mapping provides a perfect description of the target instance given the source instance. We show that this definition yields “intuitive” results when applied on database instances derived from each other by basic operations. We study the complexity of decision problems related to this optimality notion in the context of different logical languages and show that, even in very restricted cases, the problem is of high complexity.

[1]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[2]  Valter Crescenzi,et al.  RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.

[3]  Roberto Solis-Oba,et al.  Reducing the Size of NFAs by Using Equivalences and Preorders , 2005, CPM.

[4]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS 2004.

[5]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[6]  Catriel Beeri,et al.  Properties of acyclic database schemes , 1981, STOC '81.

[7]  Georg Gottlob,et al.  On the complexity of some inductive logic programming problems , 1997, New Generation Computing.

[8]  Georg Gottlob,et al.  On the complexity of deriving schema mappings from database instances , 2008, PODS.

[9]  D. Koenig Theorie Der Endlichen Und Unendlichen Graphen , 1965 .

[10]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[11]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[12]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[13]  Mihalis Yannakakis,et al.  Algorithms for Acyclic Database Schemes , 1981, VLDB.

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Laura M. Haas,et al.  Clio grows up: from research prototype to industrial tool , 2005, SIGMOD '05.

[16]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[17]  Celia Wrathall,et al.  Complete Sets and the Polynomial-Time Hierarchy , 1976, Theor. Comput. Sci..

[18]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[19]  L. Vietoris Theorie der endlichen und unendlichen Graphen , 1937 .

[20]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[21]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[22]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[23]  Robert E. Tarjan,et al.  Simple Linear-Time Algorithms to Test Chordality of Graphs, Test Acyclicity of Hypergraphs, and Selectively Reduce Acyclic Hypergraphs , 1984, SIAM J. Comput..

[24]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[25]  Ronald Fagin,et al.  Quasi-inverses of schema mappings , 2007, PODS '07.

[26]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[27]  Frank Harary,et al.  Graph Theory , 2016 .

[28]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS '04.

[29]  A. Mostowski Review: B. A. Trahtenbrot, Impossibility of an Algorithm for the Decision Problem in Finite Classes , 1950, Journal of Symbolic Logic.

[30]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.

[31]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[32]  Georg Gottlob,et al.  Hypertree decompositions and tractable queries , 1998, J. Comput. Syst. Sci..

[33]  Salil P. Vadhan,et al.  Computational Complexity , 2005, Encyclopedia of Cryptography and Security.

[34]  George H. L. Fletcher On the data mapping problem , 2007 .

[35]  Robert A. Di Paola The Recursive Unsolvability of the Decision Problem for the Class of Definite Formulas , 1969, JACM.