Benchmarking XML-Schema Matching Algorithms for Improving Automated Tuning

Several matching algorithms were recently developed in order to automate or semi-automate the process of correspondences discovery between XML schemas. These algorithms use a wide range of approaches and matching techniques covering linguistic similarity, structural similarity, constraints, etc. The final matching combines arithmetically different results stemmed from these techniques. The aggregation of the results uses often many parameters and weights to be adjusted manually. Generally, this task is achieved by human experts and requires a perfect understanding of the matching algorithm. In order to reduce the human intervention and improve matching quality, we suggest automating the tuning of the various structural parameters used within XML-Schema matching algorithms. In this work, we offer a benchmark, for three tools, that seeks mathematical relations between parameters values and schema topology. In consequent, we propose an algorithm for the tuning of these parameters for studied tools.

[1]  Erhard Rahm,et al.  Web, Web-Services, and Database Systems , 2003, Lecture Notes in Computer Science.

[2]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[3]  Arnon Rosenthal,et al.  Tuning Schema Matching Software using Synthetic Scenarios , 2005, VLDB.

[4]  Mikalai Yatskevich,et al.  Preliminary Evaluation of Schema Matching Systems , 2003 .

[5]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  Reza Sadri Associates , 1947 .

[8]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[9]  RahmErhard,et al.  A survey of approaches to automatic schema matching , 2001, VLDB 2001.

[10]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[11]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[12]  Avigdor Gal,et al.  Measuring the relative performance of schema matchers , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[13]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[14]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[15]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[16]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[17]  Steffen Staab,et al.  QOM - Quick Ontology Mapping , 2004, GI Jahrestagung.

[18]  Yuzhong Qu,et al.  FalconAO: Aligning Ontologies with Falcon , 2005, Integrating Ontologies.

[19]  Pavel Shvaiko,et al.  A Classification of Schema-Based Matching Approaches , 2004 .

[20]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[21]  Joseph A. Goguen,et al.  Critical Points for Interactive Schema Matching , 2004, APWeb.

[22]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[23]  Philip A. Bernstein,et al.  Industrial-strength schema matching , 2004, SGMD.

[24]  Lukasz A. Kurgan,et al.  Semantic Mapping of XML Tags Using Inductive Machine Learning , 2002, ICMLA.