CMC: Combining Multiple Schema-Matching Strategies Based on Credibility Prediction

Schema matching is a key operation in data engineering. Combining multiple matching strategies is a very promising technique for schema matching. To overcome the limitations of existing combination systems and to achieve better performances, in this paper the CMC system is proposed, which combines multiple matchers based on credibility prediction. We first predict the accuracy of each matcher on the current matching task, and accordingly calculate each matcher's credibility. These credibilities are then used as weights in aggregating the matching results of different matchers into a combined one. Our experiments on real world schemas validate the merits of our system.

[1]  AnHai Doan,et al.  Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  Luigi Palopoli,et al.  Uniform Techniques for Deriving Similarities of Objects and Subschemes in Heterogeneous Databases , 2003, IEEE Trans. Knowl. Data Eng..

[3]  Mikalai Yatskevich,et al.  Preliminary Evaluation of Schema Matching Systems , 2003 .

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[6]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[7]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[8]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.

[9]  Jeffrey F. Naughton,et al.  On schema matching with opaque column names and data values , 2003, SIGMOD '03.

[10]  Yong Yu,et al.  Mutual Enhancement of Schema Mapping and Data Mapping , 2004 .

[11]  Amihai Motro,et al.  Database Schema Matching Using Machine Learning with Feature Selection , 2002, CAiSE.

[12]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[13]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[14]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.