Element similarity measures in XML schema matching

Schema matching plays a central role in a myriad of XML-based applications. There has been a growing need for developing high-performance matching systems in order to identify and discover semantic correspondences across XML data. XML schema matching methods face several challenges in the form of definition, adoption, utilization, and combination of element similarity measures. In this paper, we classify, review, and experimentally compare major methods of element similarity measures and their combinations. We aim at presenting a unified view which is useful when developing a new element similarity measure, when implementing an XML schema matching component, when using an XML schema matching system, and when comparing XML schema matching systems.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3]  Gunter Saake,et al.  Improving XML schema matching performance using Prüfer sequences , 2009, Data Knowl. Eng..

[4]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[5]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[6]  Zohra Bellahsene,et al.  PORSCHE: Performance ORiented SCHEma mediation , 2008, Inf. Syst..

[7]  Gunter Saake,et al.  Combining Effectiveness and Efficiency for Schema Matching Evaluation , 2008, MBSDI.

[8]  Richi Nayak,et al.  A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity , 2007, Int. J. Pattern Recognit. Artif. Intell..

[9]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.

[10]  Richard Chbeir,et al.  Extensible User-Based XML Grammar Matching , 2009, ER.

[11]  Anna Formica,et al.  Similarity of XML-Schema Elements: A Structural and Information Content Approach , 2008, Comput. J..

[12]  Gleb Beliakov,et al.  Aggregation Functions: A Guide for Practitioners , 2007, Studies in Fuzziness and Soft Computing.

[13]  Euripides G. M. Petrakis,et al.  Information Retrieval by Semantic Similarity , 2006, Int. J. Semantic Web Inf. Syst..

[14]  Gunter Saake,et al.  Efficiently Locating Web Services using a Sequence-based Schema Matching Approach , 2009, ICEIS.

[15]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Yanchun Zhang,et al.  Web Services Discovery Based On Schema Matching , 2007, ACSC.

[17]  Erhard Rahm,et al.  Matching large schemas: Approaches and evaluation , 2007, Inf. Syst..

[18]  Mong-Li Lee,et al.  XClust: clustering XML schemas for effective integration , 2002, CIKM '02.

[19]  Weng Tat Chan,et al.  XML application schema matching using similarity measure and relaxation labeling , 2005, Inf. Sci..

[20]  Gad M. Landau,et al.  An Extension of the Vector Space Model for Querying XML Documents via XML Fragments 1 , 2002 .

[21]  Zohra Bellahsene,et al.  An Indexing Structure for Automatic Schema Matching , 2007, ICDE Workshops.

[22]  Arnon Rosenthal,et al.  eTuner: tuning schema matching software using synthetic scenarios , 2007, The VLDB Journal.

[23]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[24]  Richi Nayak,et al.  XML schema clustering with semantic and hierarchical similarity measures , 2007, Knowl. Based Syst..

[25]  Pedro M. Domingos,et al.  Learning to Match the Schemas of Data Sources: A Multistrategy Approach , 2003, Machine Learning.

[26]  Fausto Giunchiglia,et al.  Semantic Matching: Algorithms and Implementation , 2007, J. Data Semant..

[27]  Dongwon Lee,et al.  Comparative analysis of six XML schema languages , 2000, SGMD.

[28]  Jérôme Euzenat,et al.  Ten Challenges for Ontology Matching , 2008, OTM Conferences.

[29]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[30]  Richi Nayak,et al.  XML Schema Element Similarity Measures: A Schema Matching Context , 2009, OTM Conferences.

[31]  A Min Tjoa,et al.  Automating the Schema Matching Process for Heterogeneous Data Warehouses , 2007, DaWaK.

[32]  Fausto Giunchiglia,et al.  Element Level Semantic Matching , 2004 .

[33]  Carmel Domshlak,et al.  Rank Aggregation for Automatic Schema Matching , 2007, IEEE Transactions on Knowledge and Data Engineering.

[34]  Paul F. Dietz Maintaining order in a linked list , 1982, STOC '82.

[35]  Richi Nayak,et al.  Fast and effective clustering of XML data using structural information , 2008, Knowledge and Information Systems.

[36]  Timos K. Sellis,et al.  Modeling and manipulating the structure of hierarchical schemas for the web , 2008, Inf. Sci..

[37]  Wei Cheng,et al.  GSMA: A Structural Matching Algorithm for Schema Matching in Data Warehousing , 2005, FSKD.

[38]  Angela Bonifati,et al.  Schema mapping verification: the spicy way , 2008, EDBT '08.

[39]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[40]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[41]  Aida Boukottaya,et al.  Schema matching for transforming structured documents , 2005, DocEng '05.

[42]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[43]  Erhard Rahm,et al.  Quickmig: automatic schema matching for data migration projects , 2007, CIKM '07.

[44]  Beniamino Di Martino,et al.  Semantic web services discovery based on structural ontology matching , 2009, Int. J. Web Grid Serv..

[45]  Gunter Saake,et al.  A schema matching-based approach to XML schema clustering , 2008, iiWAS.

[46]  Pedro M. Domingos,et al.  Ontology Matching: A Machine Learning Approach , 2004, Handbook on Ontologies.

[47]  Hyoung-Joo Kim,et al.  A clustering method based on path similarities of XML data , 2007, Data Knowl. Eng..

[48]  Hyunbo Cho,et al.  A novel method for measuring semantic similarity for XML schema matching , 2008, Expert Syst. Appl..

[49]  Fausto Giunchiglia,et al.  A Large Scale Taxonomy Mapping Evaluation , 2005, International Semantic Web Conference.

[50]  Irena Holubová,et al.  Structural and semantic aspects of similarity of Document Type Definitions and XML schemas , 2010, Inf. Sci..

[51]  Rada Chirkova,et al.  Efficiently Querying Large XML Data Repositories: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[52]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[53]  Angela Bonifati,et al.  The Spicy Project: A New Approach to Data Matching , 2006, SEBD.

[54]  Avigdor Gal,et al.  A framework for modeling and evaluating automatic semantic reconciliation , 2005, The VLDB Journal.

[55]  Richard Chbeir,et al.  An overview on XML similarity: Background, current trends and future directions , 2009, Comput. Sci. Rev..