Uncertain Schema Matching

Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. Schema matching is one of the basic operations required by the process of data and schema integration, and thus has a great effect on its outcomes, whether these involve targeted content delivery, view integration, database integration, query rewriting over heterogeneous sources, duplicate data elimination, or automatic streamlining of workflow activities that involve heterogeneous data sources. Although schema matching research has been ongoing for over 25 years, more recently a realization has emerged that schema matchers are inherently uncertain. Since 2003, work on the uncertainty in schema matching has picked up, along with research on uncertainty in other areas of data management. This lecture presents various aspects of uncertainty in schema matching within a single unified framework. We introduce basic formulations of uncertainty and provide several alternative representations of schema matching uncertainty. Then, we cover two common methods that have been proposed to deal with uncertainty in schema matching, namely ensembles, and top-K matchings, and analyze them in this context. We conclude with a set of real-world applications.

[1]  John A. Drakopoulos,et al.  Probabilities, possibilities, and fuzzy sets , 1995, Fuzzy Sets Syst..

[2]  Richard Hull Relative Information Capacity of Simple Relational Database Schemata , 1986, SIAM J. Comput..

[3]  Dov Dori,et al.  Automatically Grounding Semantically-Enriched Conceptual Models to Concrete Web Services , 2005, ER.

[4]  Robert W. Irving,et al.  The Stable marriage problem - structure and algorithms , 1989, Foundations of computing series.

[5]  Wenfei Fan,et al.  Information preserving XML schema embedding , 2005, TODS.

[6]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matching with Top-K Schema Mappings , 2006, J. Data Semant..

[7]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[8]  S. Lane Categories for the Working Mathematician , 1971 .

[9]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[10]  Philip A. Bernstein,et al.  Industrial-strength schema matching , 2004, SGMD.

[11]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[12]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[13]  Matteo Magnani,et al.  Schema Integration Based on Uncertain Semantic Mappings , 2005, ER.

[14]  Carmel Domshlak,et al.  Rank Aggregation for Automatic Schema Matching , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Weifeng Su,et al.  Holistic Schema Matching for Web Query Interfaces , 2006, EDBT.

[16]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[17]  M. Queyranne,et al.  K best solutions to combinatorial optimization problems , 1985 .

[18]  Felix Naumann,et al.  Schema matching using duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[20]  Weifeng Su,et al.  Domain-based data integration for web databases , 2007 .

[21]  AnHai Doan,et al.  Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  AnHai Doan,et al.  iMAP: Discovering Complex Mappings between Database Schemas. , 2004, SIGMOD 2004.

[23]  Zohra Bellahsene,et al.  A Flexible Approach for Planning Schema Matching Algorithms , 2008, OTM Conferences.

[24]  João C. N. Clímaco,et al.  A note on a new variant of Murty’s ranking assignments algorithm , 2003, 4OR.

[25]  Silvana Castano,et al.  Global Viewing of Heterogeneous Data Sources , 2001, IEEE Trans. Knowl. Data Eng..

[26]  Avigdor Gal,et al.  The Use of Machine-Generated Ontologies in Dynamic Information Seeking , 2001, CoopIS.

[27]  Erhard Rahm,et al.  Rondo: a programming platform for generic model management , 2003, SIGMOD '03.

[28]  Katta G. Murty,et al.  Letter to the Editor - An Algorithm for Ranking all the Assignments in Order of Increasing Cost , 1968, Oper. Res..

[29]  Ronald Fagin,et al.  Inverting schema mappings , 2006, TODS.

[30]  Alsayed Algergawy,et al.  Management of XML data by means of schema matching , 2010 .

[31]  Avigdor Gal Enhancing the Capabilities of Attribute Correspondences , 2011, Schema Matching and Mapping.

[32]  Umberto Straccia,et al.  sPLMap: A Probabilistic Approach to Schema Matching , 2005, ECIR.

[33]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[34]  Arnon Rosenthal,et al.  eTuner: tuning schema matching software using synthetic scenarios , 2007, The VLDB Journal.

[35]  Umberto Straccia,et al.  Information retrieval and machine learning for probabilistic schema matching , 2007, Inf. Process. Manag..

[36]  Fausto Giunchiglia,et al.  Semantic Schema Matching , 2005, OTM Conferences.

[37]  Kevin Chen-Chuan Chang,et al.  Making holistic schema matching robust: an ensemble approach , 2005, KDD '05.

[38]  Avigdor Gal,et al.  Boosting Schema Matchers , 2008, OTM Conferences.

[39]  Philip A. Bernstein,et al.  A Model Theory for Generic Schema Management , 2001, DBPL.

[40]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[41]  Steffen Staab,et al.  Bootstrapping ontology alignment methods with APFEL , 2005, WWW '05.

[42]  Denilson Barbosa,et al.  Designing Information-Preserving Mapping Schemes for XML , 2005, VLDB.

[43]  Hans-Arno Jacobsen,et al.  Modeling uncertainties in publish/subscribe systems , 2004, Proceedings. 20th International Conference on Data Engineering.

[44]  ZVI GALIL,et al.  Efficient algorithms for finding maximum matching in graphs , 1986, CSUR.

[45]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[46]  Ken Samuel,et al.  Integration Workbench: Integrating Schema Integration Tools , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[47]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[48]  K. G. Murty An Algorithm for Ranking All the Assignment in Order of Increasing Cost , 1968 .

[49]  Philip A. Bernstein,et al.  Incremental schema matching , 2006, VLDB.

[50]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[51]  Avigdor Gal,et al.  On the Cardinality of Schema Matching , 2005, OTM Workshops.

[52]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[53]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[54]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[55]  Alon Y. Halevy,et al.  Data integration with uncertainty , 2007, The VLDB Journal.

[56]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[57]  Siegfried Gottwald,et al.  Fuzzy Sets and Fuzzy Logic , 1993 .

[58]  V. S. Subrahmanian,et al.  Aggregate Query Answering under Uncertain Schema Mappings , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[59]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[60]  Avigdor Gal,et al.  Why is schema matching tough and what can we do about it? , 2006, SGMD.

[61]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matcher Ensembles , 2007, SUM.

[62]  Alon Y. Halevy,et al.  Bootstrapping pay-as-you-go data integration systems , 2008, SIGMOD Conference.

[63]  Amihai Motro,et al.  Autoplex: Automated Discovery of Content for Virtual Databases , 2001, CoopIS.

[64]  S. Ross A First Course in Probability , 1977 .

[65]  Serena Sorrentino,et al.  Automatic generation of probabilistic relationships for improving schema matching , 2011, Inf. Syst..

[66]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[67]  Edleno Silva de Moura,et al.  An approach to XML path matching , 2007, WIDM '07.

[68]  Matteo Magnani,et al.  A Survey on Uncertainty Management in Data Integration , 2010, JDIQ.

[69]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[70]  Joseph Y. Halpern Reasoning about uncertainty , 2003 .

[71]  Chandra R. Chegireddy,et al.  Algorithms for finding K-best perfect matchings , 1987, Discret. Appl. Math..

[72]  Jens Vygen,et al.  The Book Review Column1 , 2020, SIGACT News.

[73]  Zohra Bellahsene,et al.  Performance Oriented Schema Matching , 2007, DEXA.

[74]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.

[75]  Pedro M. Domingos,et al.  Representing and reasoning about mappings between domain models , 2002, AAAI/IAAI.

[76]  Philip A. Bernstein,et al.  Meta data management , 2004, Proceedings. 20th International Conference on Data Engineering.

[77]  Vipul Kashyap,et al.  Imprecise Answers in Distributed Environments: Estimation of Information Loss for Multi-Ontology Based Query Processing , 2000, Int. J. Cooperative Inf. Syst..

[78]  Jérôme Euzenat,et al.  Dissimilarity Measure for Collections of Objects and Values , 1997, IDA.

[79]  Reynold Cheng,et al.  Managing uncertainty of XML schema matching , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[80]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[81]  Renée J. Miller,et al.  The Use of Information Capacity in Schema Integration and Translation , 1993, VLDB.

[82]  David W. Embley,et al.  Atribute Match Discovery in Information Integration: Exploiting Multiple Facets of Metadata , 2002, J. Braz. Comput. Soc..

[83]  Avigdor Gal,et al.  Automatic Ontology Matching Using Application Semantics , 2005, AI Mag..

[84]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[85]  Murray G. Murphey,et al.  Truth and history , 2008 .

[86]  Avigdor Gal,et al.  A framework for modeling and evaluating automatic semantic reconciliation , 2005, The VLDB Journal.

[87]  Avigdor Gal,et al.  Tuning the ensemble selection process of schema matchers , 2010, Inf. Syst..

[88]  Sergey Melnik,et al.  Generic Model Management , 2004, Lecture Notes in Computer Science.

[89]  Ronald Fagin,et al.  Quasi-inverses of schema mappings , 2007, PODS '07.

[90]  Paolo Bouquet,et al.  Soundness of Schema Matching Methods , 2005, ESWC.

[91]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.