A large dataset for the evaluation of ontology matching

Recently, the number of ontology matching techniques and systems has increased significantly. This makes the issue of their evaluation and comparison more severe. One of the challenges of the ontology matching evaluation is in building large-scale evaluation datasets. In fact, the number of possible correspondences between two ontologies grows quadratically with respect to the numbers of entities in these ontologies. This often makes the manual construction of the evaluation datasets demanding to the point of being infeasible for large-scale matching tasks. In this paper, we present an ontology matching evaluation dataset composed of thousands of matching tasks, called TaxME2. It was built semi-automatically out of the Google, Yahoo, and Looksmart web directories. We evaluated TaxME2 by exploiting the results of almost two-dozen of state-of-the-art ontology matching systems. The experiments indicate that the dataset possesses the desired key properties, namely it is error-free, incremental, discriminative, monotonic, and hard for the state-of-the-art ontology matching systems.

[1]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[2]  Olivier Bodenreider,et al.  Experience in Aligning Anatomical Ontologies , 2007, Int. J. Semantic Web Inf. Syst..

[3]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[4]  Yang Wen Semantic integration of structured and semistructured data sources , 2002 .

[5]  Kevin Chen-Chuan Chang,et al.  Automatic complex schema matching across Web query interfaces: A correlation mining approach , 2006, TODS.

[6]  Klaus R. Dittrich,et al.  Detecting Similarities in Ontologies with the SOQA-SimPack Toolkit , 2006, EDBT.

[7]  Yannis Kalfoglou,et al.  Ontology mapping: the state of the art , 2003, The Knowledge Engineering Review.

[8]  Yuzhong Qu,et al.  Constructing virtual documents for ontology matching , 2006, WWW '06.

[9]  Asunción Gómez-Pérez,et al.  Guidelines for Benchmarking the Performance of Ontology Management APIs , 2005, SEMWEB.

[10]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative , 2007 .

[11]  Fausto Giunchiglia,et al.  Discovering Missing Background Knowledge in Ontology Matching , 2006, ECAI.

[12]  Patrick Lambrix,et al.  SAMBO - A system for aligning and merging biomedical ontologies , 2006, J. Web Semant..

[13]  DoanAnHai,et al.  Learning to match ontologies on the Semantic Web , 2003, VLDB 2003.

[14]  Weifeng Su,et al.  Holistic Schema Matching for Web Query Interfaces , 2006, EDBT.

[15]  Tsvi Kuflik,et al.  Supporting user-subjective categorization with self-organizing maps and learning vector quantization , 2005, J. Assoc. Inf. Sci. Technol..

[16]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[17]  York Sure-Vetter,et al.  Ontology Mapping - An Integrated Approach , 2004, ESWS.

[18]  Jérôme Euzenat,et al.  Semantic Precision and Recall for Ontology Alignment Evaluation , 2007, IJCAI.

[19]  Fausto Giunchiglia,et al.  Efficient Semantic Matching , 2005, ESWC.

[20]  Olivier Bodenreider,et al.  Of Mice and Men: Aligning Mouse and Human Anatomies , 2005, AMIA.

[21]  Fausto Giunchiglia,et al.  Semantic Matching: Algorithms and Implementation , 2007, J. Data Semant..

[22]  Luciano Serafini,et al.  Semantic Coordination: A New Approach and an Application , 2003, SEMWEB.

[23]  Fausto Giunchiglia,et al.  Semantic Matching: Algorithms and Implementation , 2007, J. Data Semant..

[24]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[25]  Fausto Giunchiglia,et al.  Encoding Classifications into Lightweight Ontologies , 2006, ESWC.

[26]  Heiner Stuckenschmidt,et al.  Description of alignment evaluation and benchmarking results , 2007 .

[27]  Fausto Giunchiglia,et al.  Element Level Semantic Matching , 2004 .

[28]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[29]  Avigdor Gal,et al.  The Use of Machine-Generated Ontologies in Dynamic Information Seeking , 2001, CoopIS.

[30]  Jeffrey F. Naughton,et al.  On schema matching with opaque column names and data values , 2003, SIGMOD '03.

[31]  Ryutaro Ichise,et al.  Integrating Multiple Internet Directories by Instance-based Learning , 2003, IJCAI.

[32]  Alon Y. Halevy,et al.  Semantic Integration , 2005, AI Mag..

[33]  Avigdor Gal,et al.  A framework for modeling and evaluating automatic semantic reconciliation , 2005, The VLDB Journal.

[34]  Steffen Staab,et al.  Bootstrapping Ontology Alignment Methods with APFEL , 2005, International Semantic Web Conference.

[35]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative 2007 , 2006, OM.

[36]  Alon Y. Halevy,et al.  Semantic Integration Research in the Database Community : A Brief Survey , 2005 .

[37]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[38]  Fausto Giunchiglia,et al.  A Large Scale Taxonomy Mapping Evaluation , 2005, International Semantic Web Conference.

[39]  Mark A. Musen,et al.  The PROMPT suite: interactive tools for ontology merging and mapping , 2003, Int. J. Hum. Comput. Stud..

[40]  Fausto Giunchiglia,et al.  S-Match: an Algorithm and an Implementation of Semantic Matching , 2004, ESWS.

[41]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[42]  HeBin,et al.  Automatic complex schema matching across Web query interfaces , 2006 .

[43]  Jérôme Euzenat,et al.  Similarity-Based Ontology Alignment in OWL-Lite , 2004, ECAI.

[44]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[45]  Heiner Stuckenschmidt,et al.  Introduction to the Ontology Alignment Evaluation 2005 , 2005, Integrating Ontologies.