Feedback-based annotation, selection and refinement of schema mappings for dataspaces

The specification of schema mappings has proved to be time and resource consuming, and has been recognized as a critical bottleneck to the large scale deployment of data integration systems. In an attempt to address this issue, dataspaces have been proposed as a data management abstraction that aims to reduce the up-front cost required to setup a data integration system by gradually specifying schema mappings through interaction with end users in a pay-as-you-go fashion. As a step in this direction, we explore an approach for incrementally annotating schema mappings using feedback obtained from end users. In doing so, we do not expect users to examine mapping specifications; rather, they comment on results to queries evaluated using the mappings. Using annotations computed on the basis of user feedback, we present a method for selecting from the set of candidate mappings, those to be used for query evaluation considering user requirements in terms of precision and recall. In doing so, we cast mapping selection as an optimization problem. Mapping annotations may reveal that the quality of schema mappings is poor. We also show how feedback can be used to support the derivation of better quality mappings from existing mappings through refinement. An evolutionary algorithm is used to efficiently and effectively explore the large space of mappings that can be obtained through refinement. The results of evaluation exercises show the effectiveness of our solution for annotating, selecting and refining schema mappings.

[1]  David W. Embley,et al.  Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages , 1999, Data Knowl. Eng..

[2]  Dr. Zbigniew Michalewicz,et al.  How to Solve It: Modern Heuristics , 2004 .

[3]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[4]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[5]  Christian Blum,et al.  Metaheuristics in combinatorial optimization: Overview and conceptual comparison , 2003, CSUR.

[6]  Wang Chiew Tan,et al.  Debugging schema mappings with routes , 2006, VLDB.

[7]  Renée J. Miller,et al.  Muse: Mapping Understanding and deSign by Example , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  David W. Embley,et al.  A composite approach to automating direct and indirect schema mappings , 2006, Inf. Syst..

[9]  Alon Y. Halevy,et al.  Data integration with uncertainty , 2007, The VLDB Journal.

[10]  Roland H. Kaschek,et al.  Where Ontology Affects Information Systems , 2003, ISTA.

[11]  Angela Bonifati,et al.  Schema mapping verification: the spicy way , 2008, EDBT '08.

[12]  Alon Y. Halevy,et al.  Pay-as-you-go user feedback for dataspace systems , 2008, SIGMOD Conference.

[13]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[14]  Mitesh Patel,et al.  Accessing the deep web , 2007, CACM.

[15]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[16]  Alexandra Poulovassilis,et al.  Data Access and Integration in the ISPIDER Proteomics Grid , 2006, DILS.

[17]  Avigdor Gal,et al.  Why is schema matching tough and what can we do about it? , 2006, SGMD.

[18]  AnHai Doan,et al.  Mapping Maintenance for Data Integration Systems , 2005, VLDB.

[19]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[20]  AnHai Doan,et al.  Integrating data from disparate sources: a mass collaboration approach , 2005, 21st International Conference on Data Engineering (ICDE'05).

[21]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[22]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[23]  David Maier,et al.  Principles of dataspace systems , 2006, PODS '06.

[24]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[25]  Daniel A. Menascé,et al.  Utility-based QoS Brokering in Service Oriented Architectures , 2007, IEEE International Conference on Web Services (ICWS 2007).

[26]  Koby Crammer,et al.  Learning to create data-integrating queries , 2008, Proc. VLDB Endow..