From Diversity-based Prediction to Better Ontology & Schema Matching

Ontology & schema matching predictors assess the quality of matchers in the absence of an exact match. We propose MCD (Match Competitor Deviation), a new diversity-based predictor that compares the strength of a matcher confidence in the correspondence of a concept pair with respect to other correspondences that involve either concept. We also propose to use MCD as a regulator to optimally control a balance between Precision and Recall and use it towards 1:1 matching by combining it with a similarity measure that is based on solving a maximum weight bipartite graph matching (MWBM). Optimizing the combined measure is known to be an NP-Hard problem. Therefore, we propose CEM, an approximation to an optimal match by efficiently scanning multiple possible matches, using rare event estimation. Using a thorough empirical study over several benchmark real-world datasets, we show that MCD outperforms other state-of-the-art predictor and that CEM significantly outperform existing matchers.

[1]  Matteo Magnani,et al.  Uncertain Schema Matching , 2006, SEBD.

[2]  Cosmin Stroe,et al.  Efficient Selection of Mappings and Automatic Quality-driven Combination of Matching Methods , 2009, OM.

[3]  Silvio Micali,et al.  Priority queues with variable priority and an O(EV log V) algorithm for finding a maximal weighted matching in general graphs , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[4]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[6]  Robert E. Tarjan,et al.  On Minimum-Cost Assignments in Unbalanced Bipartite Graphs , 2012 .

[7]  John Mylopoulos,et al.  A Semantic Approach to XML-based Data Integration , 2001, ER.

[8]  Wei-Ying Ma,et al.  Instance-based Schema Matching for Web Databases by Domain-specific Query Probing , 2004, VLDB.

[9]  Eric Peukert,et al.  A Self-Configuring Schema Matching System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[10]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[11]  Giovanni Quattrone,et al.  Integration of XML Schemas at various "severity" levels , 2006, Inf. Syst..

[12]  L. Margolin,et al.  On the Convergence of the Cross-Entropy Method , 2005, Ann. Oper. Res..

[13]  E. Polak,et al.  On Multicriteria Optimization , 1976 .

[14]  Michael B. Spring,et al.  A Harmony based Adaptive Ontology Mapping Approach , 2008, SWWS.

[15]  Avigdor Gal,et al.  A framework for modeling and evaluating automatic semantic reconciliation , 2005, The VLDB Journal.

[16]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[17]  Ehud Gudes,et al.  Abbreviation Expansion in Schema Matching and Web Integration , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[18]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[19]  Jan Mendling,et al.  Predicting the Quality of Process Model Matching , 2013, BPM.

[20]  Avigdor Gal,et al.  Tuning the ensemble selection process of schema matchers , 2010, Inf. Syst..

[21]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[22]  Erhard Rahm,et al.  Schema Matching and Mapping , 2013, Schema Matching and Mapping.

[23]  Dirk P. Kroese,et al.  Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics) , 1981 .

[24]  Kewei Tu,et al.  CMC: Combining Multiple Schema-Matching Strategies Based on Credibility Prediction , 2005, DASFAA.

[25]  Silvana Castano,et al.  A Method for the Unification of XML Schemata , 2002, Inf. Softw. Technol..

[26]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[27]  Eric Peukert,et al.  AMC - A framework for modelling and comparing matching systems as matching processes , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[28]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[29]  Avigdor Gal,et al.  Schema matching prediction with applications to data source discovery and dynamic ensembling , 2013, The VLDB Journal.

[30]  Avigdor Gal,et al.  On the Stable Marriage of Maximum Weight Royal Couples , 2007 .

[31]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[32]  AnHai Doan,et al.  Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[33]  Erhard Rahm,et al.  Generic schema matching, ten years later , 2011, Proc. VLDB Endow..

[34]  Avigdor Gal,et al.  The Use of Machine-Generated Ontologies in Dynamic Information Seeking , 2001, CoopIS.

[35]  Gareth E. Evans,et al.  Parallel cross-entropy optimization , 2007, 2007 Winter Simulation Conference.