Schema merging and mapping creation for relational sources

We address the problem of generating a mediated schema from a set of relational data source schemas and conjunctive queries that specify where those schemas overlap. Unlike past approaches that generate only the mediated schema, our algorithm also generates view definitions, i.e., source-to-mediated schema mappings. Our main goal is to understand the requirements that a mediated schema and views should satisfy, such as completeness, preservation of overlapping information, normalization, and minimality. We show how these requirements influence the detailed structure of schemas and view definitions that are produced. We introduce a normal form for mediated schemas and view definitions, show how to generate them, and prove that schemas and views in this normal form satisfy our requirements. The view definitions in our normal form use stylized GLAV mappings, for which query rewriting is easier than general GLAV mappings. We demonstrate the efficiency of query rewriting in a prototype implementation.

[1]  Andrea Calì,et al.  On the Expressive Power of Data Integration Systems , 2002, ER.

[2]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[3]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[4]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[5]  Alexandra Poulovassilis,et al.  Data integration by bi-directional schema transformation rules , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  Erhard Rahm,et al.  Supporting executable mappings in model management , 2005, SIGMOD '05.

[7]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[8]  Anthony Kosky,et al.  Theoretical Aspects of Schema Merging , 1992, EDBT.

[9]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[10]  Renée J. Miller,et al.  The Use of Information Capacity in Schema Integration and Translation , 1993, VLDB.

[11]  Philip A. Bernstein,et al.  Processing queries and merging schemas in support of data integration , 2004 .

[12]  Stefano Spaccapietra,et al.  View Integration: A Step Forward in Solving Structural Conflicts , 1994, IEEE Trans. Knowl. Data Eng..

[13]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[14]  Michael N. Gubanov,et al.  Model Management Engine for Data Integration with Reverse-Engineering Support , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Marco A. Casanova,et al.  Towards a sound view integration methodology , 1983, PODS.

[16]  Philip A. Bernstein,et al.  Merging Models Based on Given Correspondences , 2003, VLDB.

[17]  Joachim Biskup,et al.  A formal view integration method , 1986, SIGMOD '86.

[18]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[19]  Richard Hull Relative Information Capacity of Simple Relational Database Schemata , 1986, SIAM J. Comput..

[20]  Erhard Rahm,et al.  Rondo: a programming platform for generic model management , 2003, SIGMOD '03.