Evaluation of Terminological Schema Matching and Its Implications for Schema Mapping

Recently large amounts of schema data, which describe data structure of various domains such as purchase order, health, publication, geography, agriculture, environment and music, are available over the Web. Schema mapping aims to solve schema heterogeneity problem in schema data. This research thoroughly examines how string similarity metrics and text processing techniques impact on the performance of terminological schema mapping and highlights their limitations. Our experimental study demonstrates that the performance of terminological schema matching is significantly improved by using text processing techniques. However, the performance improvement is slightly different between datasets because of the characteristics of the datasets, and in spite of applying all text processing techniques, some datasets still exhibit low performance. Our research supports the claim that a system which can manage the context dependent characteristics of terminological schema matching is essential for better schema mapping algorithms.

[1]  Tharam S. Dillon,et al.  On the Move to Meaningful Internet Systems, OTM 2010 , 2010, Lecture Notes in Computer Science.

[2]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[3]  Stefanos D. Kollias,et al.  A String Metric for Ontology Alignment , 2005, SEMWEB.

[4]  Fabio A. González,et al.  Generalized Mongue-Elkan Method for Approximate Text String Comparison , 2009, CICLing.

[5]  Egor V. Kostylev,et al.  Combining dependent annotations for relational algebra , 2012, ICDT '12.

[6]  Zohra Bellahsene,et al.  Opening the Black Box of Ontology Matching , 2013, ESWC.

[7]  Enrico Motta,et al.  The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings , 2005, SEMWEB.

[8]  Lora Aroyo,et al.  The Semantic Web – ISWC 2013 , 2013, Lecture Notes in Computer Science.

[9]  Zohra Bellahsene,et al.  A Generic Approach for Combining Linguistic and Context Profile Metrics in Ontology Matching , 2011, OTM Conferences.

[10]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[11]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[12]  Hamideh Afsarmanesh,et al.  Schema Matching and Integration for Data Sharing Among Collaborating Organizations , 2009, J. Softw..

[13]  Pascal Hitzler,et al.  String Similarity Metrics for Ontology Alignment , 2013, SEMWEB.

[14]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[15]  Phokion G. Kolaitis,et al.  Learning schema mappings , 2012, ICDT '12.

[16]  Avigdor Gal,et al.  Boosting Schema Matchers , 2008, OTM Conferences.

[17]  Divesh Srivastava,et al.  Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.

[18]  Wei Cheng,et al.  An Efficient Schema Matching Algorithm , 2005, KES.

[19]  Oscar Corcho,et al.  The Semantic Web: Semantics and Big Data , 2013, Lecture Notes in Computer Science.

[20]  Gustavo Alonso,et al.  TRAMP: Understanding the Behavior of Schema Mappings through Provenance , 2010, Proc. VLDB Endow..

[21]  Tengku M. T. Sembok,et al.  Automating XML schema matching: A composite approach , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.