BMatch: a Semantically Context-based Tool Enhanced by an Indexing Structure to Accelerate Schema Matching

Schema matching is a crucial task to gather information of the same domain. This is more true on the web, where a large number of data sources are available and require to be matched. However, the schema matching process is still largely performed manually or semi-automatically, discouraging the deployment of large-scale mediation systems. Indeed, these large-scale scenarii need a solution which ensures both an acceptable matching quality and good performance. In this article, we present an approach to match efficiently a large number of schemas. The quality aspect is based on the combination of terminological methods and cosine measure between context vectors. The performance aspect re∗Supported by ANR Research Grant ANR-05-MMSA0007 lies on a B-tree indexing structure to reduce the search space. Finally, our approach, BMatch, has been implemented and the experiments with real sets of schemas show that it is both scalable and provides an acceptable matching quality when compared with the results obtained by the most referenced matching tools.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  Marc Ehrig,et al.  Similarity for Ontologies - A Comprehensive Framework , 2005, ECIS.

[3]  Arnon Rosenthal,et al.  Tuning Schema Matching Software using Synthetic Scenarios , 2005, VLDB.

[4]  Ross Wilkinson,et al.  Using the cosine measure in a neural network for document retrieval , 1991, SIGIR '91.

[5]  Maguelonne Teisseire,et al.  Where's Charlie: family based heuristics for peer-to-peer schema integration , 2004, Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04..

[6]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[7]  Hassen Kefi Ontologies et aide à l'utilisateur pour l'interrogation de sources multiples et hétérogènes , 2006 .

[8]  Erhard Rahm,et al.  Matching large XML schemas , 2004, SGMD.

[9]  Pedro M. Domingos,et al.  Ontology Matching: A Machine Learning Approach , 2004, Handbook on Ontologies.

[10]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[11]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[12]  Zohra Bellahsene,et al.  A Context-based Measure for Discovering Approximate Semantic Matching between Schema Elements , 2007, RCIS.

[13]  Steffen Staab,et al.  Measuring Similarity between Ontologies , 2002, EKAW.

[14]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[15]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[16]  Nelson H. F. Beebe,et al.  A Complete Bibliography of Publications in the VLDB Journal: Very Large Data Bases , 2006 .

[17]  Mikalai Yatskevich,et al.  Preliminary Evaluation of Schema Matching Systems , 2003 .

[18]  RahmErhard,et al.  A survey of approaches to automatic schema matching , 2001, VLDB 2001.

[19]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.