Schema AND Data: A Holistic Approach to Mapping, Resolution and Fusion in Information Integration

To integrate information, data in different formats, from dif- ferent, potentially overlapping sources, must be related and transformed to meet the users' needs. Ten years ago, Clio introduced nonprocedural schema mappings to describe the relationship between data in heteroge- neous schemas. This enabled powerful tools for mapping discovery and integration code generation, greatly simplifying the integration process. However, further progress is needed. We see an opportunity to raise the level of abstraction further, to encompass both data- and schema-centric integration tasks and to isolate applications from the details of how the integration is accomplished. Holistic information integration supports it- eration across the various integration tasks, leveraging information about both schema and data to improve the integrated result. Integration inde- pendence allows applications to be independent of how, when, and where information integration takes place, making materialization and the tim- ing of transformations an optimization decision that is transparent to applications. In this paper, we define these two important goals, and propose leveraging data mappings to create a framework that supports both data- and schema-level integration tasks.

[1]  Catriel Beeri,et al.  A Proof Procedure for Data Dependencies , 1984, JACM.

[2]  Wang Chiew Tan,et al.  STBenchmark: towards a benchmark for mapping systems , 2008, Proc. VLDB Endow..

[3]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[4]  Renée J. Miller,et al.  Leveraging data and structure in ontology integration , 2007, SIGMOD '07.

[5]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[6]  Laura M. Haas,et al.  Clio: Schema Mapping Creation and Data Exchange , 2009, Conceptual Modeling: Foundations and Applications.

[7]  Renée J. Miller,et al.  Muse: Mapping Understanding and deSign by Example , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Paolo Papotti,et al.  Clip: a Visual Language for Explicit Schema Mappings , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  Eric Yu,et al.  Conceptual Modeling: Foundations and Applications: Essays in Honor of John Mylopoulos , 2009 .

[10]  Gabriel M. Kuper,et al.  Structural Properties of XPath Fragments , 2003, ICDT.

[11]  Helmut Seidl,et al.  Exact XML Type Checking in Polynomial Time , 2007, ICDT.

[12]  Marcelo Arenas,et al.  Data Sharing Through Query Translation in Autonomous Sources , 2004, VLDB.

[13]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[14]  Angela Bonifati,et al.  Schema mapping verification: the spicy way , 2008, EDBT '08.

[15]  Wang Chiew Tan,et al.  Debugging schema mappings with routes , 2006, VLDB.

[16]  Laura M. Haas,et al.  Clio: a semi-automatic tool for schema mapping , 2001, SIGMOD '01.

[17]  Renée J. Miller,et al.  Linkage Query Writer , 2009, Proc. VLDB Endow..

[18]  Martin Hentschel Scalable Data Integration by Mapping Data to Queries , 2009 .

[19]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[20]  Cong Yu,et al.  Constraint-based XML query rewriting for data integration , 2004, SIGMOD '04.

[21]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[22]  Laura M. Haas,et al.  Beauty and the Beast: The Theory and Practice of Information Integration , 2007, ICDT.

[23]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[24]  John Mylopoulos,et al.  A Semantic Approach to Discovering Schema Mapping Expressions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.