Data-driven understanding and refinement of schema mappings

At the heart of many data-intensive applications is the problem of quickly and accurately transforming data into a new form. Database researchers have long advocated the use of declarative queries for this process. Yet tools for creating, managing and understanding the complex queries necessary for data transformation are still too primitive to permit widespread adoption of this approach. We present a new framework that uses data examples as the basis for understanding and refining declarative schema mappings. We identify a small set of intuitive operators for manipulating examples. These operators permit a user to follow and refine an example by walking through a data source. We show that our operators are powerful enough both to identify a large class of schema mappings and to distinguish effectively between alternative schema mappings. These operators permit a user to quickly and intuitively build and refine complex data transformation queries that map one data source into another.

[1]  Ronald Fagin,et al.  A simplied universal relation assumption and its properties , 1982, TODS.

[2]  Jeffrey D. Ullman,et al.  SYSTEM/U: a database system based on the universal relation assumption , 1984, TODS.

[3]  David Maier,et al.  Relaxing the universal relation scheme assumption , 1985, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[4]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[5]  César A. Galindo-Legaria,et al.  Outerjoins as disjunctions , 1994, SIGMOD '94.

[6]  Balakrishna R. Iyer,et al.  Hypergraph based reorderings of outer join queries with complex predicates , 1995, SIGMOD '95.

[7]  Jeffrey D. Ullman,et al.  Integrating information by outerjoins and full disjunctions (extended abstract) , 1996, PODS.

[8]  Anand Rajaraman,et al.  Integrating Information by Outerjoins and Full Disjunctions , 1996, PODS 1996.

[9]  Arnon Rosenthal,et al.  Outerjoin simplification and reordering for query optimization , 1997, TODS.

[10]  Laura M. Haas,et al.  Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[11]  Alin Deutsch,et al.  Physical Data Independence, Constraints, and Optimization with Universal Plans , 1999, VLDB.

[12]  Elke A. Rundensteiner Letter from the Special Issue Editor , 1999, IEEE Data Eng. Bull..

[13]  Andy Chou,et al.  Scalable Spreadsheets for Interactive Data Analysis , 1999, 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[14]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[15]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[16]  Felix Naumann,et al.  Attribute classification using feature analysis , 2002, Proceedings 18th International Conference on Data Engineering.