Matcher Composition Methods for Automatic Schema Matching

We address the problem of automating the process of deciding whether two data schema elements match (that is, refer to the same actual object or concept), and propose several methods for combining evidence computed by multiple basic matchers. One class of methods uses Bayesian networks to account for the conditional dependency between the similarity values produced by individual matchers that use the same or similar information, so as to avoid overconfidence in match probability estimates and improve the accuracy of matching. Another class of methods relies on optimization switches that mitigate this dependency in a domain-independent manner. Experimental results under several testing protocols suggest that the matching accuracy of the Bayesian composite matchers can significantly exceed that of the individual component matchers, and the careful selection of optimization switches can improve matching accuracy even further.

[1]  Chris Clifton,et al.  SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..

[2]  Bo Thiesson,et al.  Accelerated Quantification of Bayesian Networks with Incomplete Data , 1995, KDD.

[3]  Pedro M. Domingos,et al.  Learning to Match the Schemas of Data Sources: A Multistrategy Approach , 2003, Machine Learning.

[4]  Yun Peng,et al.  Belief Update in Bayesian Networks Using Uncertain Evidence , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[5]  David Heckerman,et al.  Bayesian Graphical Models and Networks , 2001 .

[6]  Zohra Bellahsene,et al.  A Flexible Approach for Planning Schema Matching Algorithms , 2008, OTM Conferences.

[7]  A. Rajesh,et al.  XML Schema Matching – Using Structural Information , 2010 .

[8]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[9]  David W. Embley,et al.  Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration , 2001, Workshop on Information Integration on the Web.

[10]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[11]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[12]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[13]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[14]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matcher Ensembles , 2007, SUM.

[15]  Erhard Rahm,et al.  Matching large schemas: Approaches and evaluation , 2007, Inf. Syst..

[16]  Amihai Motro,et al.  Database Schema Matching Using Machine Learning with Feature Selection , 2002, International Conference on Advanced Information Systems Engineering.

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[18]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[19]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[20]  Zohra Bellahsene,et al.  (Not) yet another matcher , 2009, CIKM.