Effectiveness Bounds for Non-Exhaustive Schema Matching Systems

Semantic validation of the effectiveness of a schema matching system is traditionally performed by comparing system-generated mappings with those of human evaluators. The human effort required for validation quickly becomes huge in large scale environments. The performance of a matching system, however, is not solely determined by the quality of the mappings, but also by the efficiency with which it can produce them. Improving efficiency quickly leads to a trade-off between efficiency and effectiveness. Establishing or obtaining a large test collection for measuring this trade-off is often a severe obstacle. In this paper, we present a technique for determining lower and upper bounds for effectiveness measures for a certain class of schema matching system improvements in order to lower the required validation effort. Effectiveness bounds for a matching system improvement are solely derived from a comparison of answer sets of the improved and original matching system. The technique was developed in the context of improving efficiency in XML schema matching, but we believe it to be more generically applicable in other retrieval systems facing scalability problems.

[1]  Arnon Rosenthal,et al.  Tuning Schema Matching Software using Synthetic Scenarios , 2005, VLDB.

[2]  Claude Delobel,et al.  Semantic integration in Xyleme: a uniform tree-based approach , 2003, Data Knowl. Eng..

[3]  Willem Jonker,et al.  Using Element Clustering to Increase the Efficiency of XML Schema Matching , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[4]  Roger Barga,et al.  Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA , 2006, ICDE Workshops.

[5]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[6]  Mark Sanderson,et al.  Forming test collections with no system pooling , 2004, SIGIR '04.

[7]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.

[8]  Michel Beigbeder,et al.  An experimental methodology to study collections size impact on retrieval effectiveness , 2005 .

[9]  Willem Jonker,et al.  Formalizing the XML Schema Matching Problem as a Constraint Optimization Problem , 2005, DEXA.

[10]  Erhard Rahm,et al.  Matching large XML schemas , 2004, SGMD.

[11]  Philip A. Bernstein,et al.  Industrial-strength schema matching , 2004, SGMD.

[12]  Donna Harman,et al.  Overview of the First Text REtrieval Conference. , 1993, SIGIR 1993.

[13]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[14]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[15]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[16]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[17]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[18]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.