Scalable data exchange with functional dependencies

The recent literature has provided a solid theoretical foundation for the use of schema mappings in data-exchange applications. Following this formalization, new algorithms have been developed to generate optimal solutions for mapping scenarios in a highly scalable way, by relying on SQL. However, these algorithms suffer from a serious drawback: they are not able to handle key constraints and functional dependencies on the target, i.e., equality generating dependencies (egds). While egds play a crucial role in the generation of optimal solutions, handling them with first-order languages is a difficult problem. In fact, we start from a negative result: it is not always possible to compute solutions for scenarios with egds using an SQL script. Then, we identify many practical cases in which this is possible, and develop a best-effort algorithm to do this. Experimental results show that our algorithm produces solutions of better quality with respect to those produced by previous algorithms, and scales nicely to large databases.

[1]  Ronald Fagin,et al.  Towards a theory of schema-mapping optimization , 2008, PODS.

[2]  Wang Chiew Tan,et al.  Comparing and evaluating mapping systems with STBenchmark , 2008, Proc. VLDB Endow..

[3]  Georg Gottlob,et al.  Normalization and optimization of schema mappings , 2009, The VLDB Journal.

[4]  Paolo Papotti,et al.  Concise and Expressive Mappings with +Spicy , 2009, Proc. VLDB Endow..

[5]  Georg Gottlob,et al.  Efficient core computation in data exchange , 2008, JACM.

[6]  Foto N. Afrati,et al.  Computing certain answers in the presence of dependencies , 2010, Inf. Syst..

[7]  Phokion G. Kolaitis,et al.  Laconic schema mappings: computing core universal solutions by means of SQL queries , 2009, ArXiv.

[8]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[9]  Paolo Papotti,et al.  Core schema mappings , 2009, SIGMOD Conference.

[10]  Catriel Beeri,et al.  A Proof Procedure for Data Dependencies , 1984, JACM.

[11]  Alon Y. Halevy,et al.  Recursive Plans for Information Gathering , 1997, IJCAI.

[12]  Luca Cabibbo On keys, foreign keys and nullable attributes in relational mapping systems , 2009, EDBT '09.

[13]  Phokion G. Kolaitis,et al.  Laconic Schema Mappings: Computing the Core with SQL Queries , 2009, Proc. VLDB Endow..

[14]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[15]  Alon Y. Halevy,et al.  Bootstrapping pay-as-you-go data integration systems , 2008, SIGMOD Conference.

[16]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[17]  Ryan Wisnesky,et al.  Orchid: Integrating Schema Mapping and ETL , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[18]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[19]  Philip A. Bernstein,et al.  Compiling mappings to bridge applications and databases , 2007, SIGMOD '07.

[20]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[21]  Val Tannen,et al.  Update Exchange with Mappings and Provenance , 2007, VLDB.

[22]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[23]  Bruno Marnette,et al.  Generalized schema-mappings: from termination to tractability , 2009, PODS.

[24]  Reinhard Pichler,et al.  Towards Practical Feasibility of Core Computation in Data Exchange , 2008, LPAR.