A schema mapping is a declarative specification of the relationship between instances of a source schema and a target schema. The data exchange (or data translation) problem asks: given an instance over the source schema, materialize an instance (or solution) over the target schema that satisfies the schema mapping. In general, a given source instance may have numerous different solutions. Among all the solutions, universal solutions and core universal solutions have been singled out and extensively studied. A universal solution is a most general one and also represents the entire space of solutions, while a core universal solution is the smallest universal solution and is unique up to isomorphism (hence, we can talk about the core).
The problem of designing efficient algorithms for computing the core has attracted considerable attention in recent years. In this paper, we present a method for directly computing the core by SQL queries, when schema mappings are specified by source-to-target tuple-generating dependencies (s-t tgds). Unlike prior methods that, given a source instance, first compute a target instance and then recursively minimize that instance to the core, our method avoids the construction of such intermediate instances. This is done by rewriting the schema mapping into a laconic schema mapping that is specified by first-order s-t tgds with a linear order in the active domain of the source instances. A laconic schema mapping has the property that a "direct translation" of the source instance according to the laconic schema mapping produces the core. Furthermore, a laconic schema mapping can be easily translated into SQL, hence it can be optimized and executed by a database system to produce the core. We also show that our results are optimal: the use of the linear order is inevitable and, in general, schema mappings with constraints over the target schema cannot be rewritten to a laconic schema mapping.
[1]
Ronald Fagin,et al.
Data exchange: getting to the core
,
2003,
PODS '03.
[2]
Vincent Y. Lum,et al.
EXPRESS: a data EXtraction, Processing, and Restructuring System
,
1977,
TODS.
[3]
Paolo Papotti,et al.
Nested mappings: schema mapping reloaded
,
2006,
VLDB.
[4]
Alon Y. Halevy,et al.
MiniCon: A scalable algorithm for answering queries using views
,
2000,
The VLDB Journal.
[5]
Ashok K. Chandra,et al.
Optimal implementation of conjunctive queries in relational data bases
,
1977,
STOC '77.
[6]
Laks V. S. Lakshmanan,et al.
HePToX: Marrying XML and Heterogeneity in Your P2P Databases
,
2005,
VLDB.
[7]
Georg Gottlob,et al.
Efficient core computation in data exchange
,
2008,
JACM.
[8]
Jan Van den Bussche,et al.
A Semideterministic Approach to Object Creation and Nondeterminism in Database Queries
,
1997,
J. Comput. Syst. Sci..
[9]
Laura M. Haas,et al.
Clio grows up: from research prototype to industrial tool
,
2005,
SIGMOD '05.
[10]
Ronald Fagin,et al.
Towards a theory of schema-mapping optimization
,
2008,
PODS.
[11]
Ronald Fagin,et al.
Data exchange: semantics and query answering
,
2005,
Theor. Comput. Sci..
[12]
Phokion G. Kolaitis,et al.
Structural characterizations of schema-mapping languages
,
2009,
ICDT '09.
[13]
Maurizio Lenzerini,et al.
Data integration: a theoretical perspective
,
2002,
PODS.
[14]
Paolo Papotti,et al.
Core schema mappings
,
2009,
SIGMOD Conference.