Enabling semantic similarity estimation across multiple ontologies: An evaluation in the biomedical domain

The estimation of the semantic similarity between terms provides a valuable tool to enable the understanding of textual resources. Many semantic similarity computation paradigms have been proposed both as general-purpose solutions or framed in concrete fields such as biomedicine. In particular, ontology-based approaches have been very successful due to their efficiency, scalability, lack of constraints and thanks to the availability of large and consensus ontologies (like WordNet or those in the UMLS). These measures, however, are hampered by the fact that only one ontology is exploited and, hence, their recall depends on the ontological detail and coverage. In recent years, some authors have extended some of the existing methodologies to support multiple ontologies. The problem of integrating heterogeneous knowledge sources is tackled by means of simple terminological matchings between ontological concepts. In this paper, we aim to improve these methods by analysing the similarity between the modelled taxonomical knowledge and the structure of different ontologies. As a result, we are able to better discover the commonalities between different ontologies and hence, improve the accuracy of the similarity estimation. Two methods are proposed to tackle this task. They have been evaluated and compared with related works by means of several widely-used benchmarks of biomedical terms using two standard ontologies (WordNet and MeSH). Results show that our methods correlate better, compared to related works, with the similarity assessments provided by experts in biomedicine.

[1]  Hideki Mima,et al.  Terminology-driven literature mining and knowledge acquisition in biomedicine , 2002, Int. J. Medical Informatics.

[2]  David Sánchez,et al.  Learning non-taxonomic relationships from web documents for domain ontology construction , 2008, Data Knowl. Eng..

[3]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[4]  J. Braun-Blanquet,et al.  Plant sociology. The study of plant communities. First ed. , 1932 .

[5]  David Sánchez,et al.  Using ontologies for structuring organizational knowledge in Home Care assistance , 2010, Int. J. Medical Informatics.

[6]  Euripides G. M. Petrakis,et al.  Information Retrieval by Semantic Similarity , 2006, Int. J. Semantic Web Inf. Syst..

[7]  Edwin R. Hancock,et al.  Graph matching using the interference of discrete-time quantum walks , 2009, Image Vis. Comput..

[8]  Hsinchun Chen,et al.  Multilingual chief complaint classification for syndromic surveillance: An experiment with Chinese chief complaints , 2008, International Journal of Medical Informatics.

[9]  Betsy L. Humphreys,et al.  Relationships in Medical Subject Headings (MeSH) , 2001 .

[10]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[11]  Y. A. Bishr,et al.  Semantic aspects of interoperable GIS , 1997 .

[12]  Asunción Gómez-Pérez,et al.  Ontological Engineering: With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web , 2004, Advanced Information and Knowledge Processing.

[13]  Dusanka Janezic,et al.  A Branch and Bound Algorithm for Matching Protein Structures , 2007, ICANNGA.

[14]  James J. Cimino,et al.  Towards the development of a conceptual distance metric for the UMLS , 2004, J. Biomed. Informatics.

[15]  Nicola Guarino,et al.  Formal Ontology in Information Systems , 1998 .

[16]  David Sánchez,et al.  Pattern-based automatic taxonomy learning from the Web , 2008, AI Commun..

[17]  Vijayan Sugumaran,et al.  Ontologies for conceptual modeling: their creation, use, and management , 2002, Data Knowl. Eng..

[18]  Hisham Al-Mubaid,et al.  Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Hoa A. Nguyen,et al.  A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[20]  David Sánchez,et al.  Ontology-driven web-based semantic similarity , 2010, Journal of Intelligent Information Systems.

[21]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[22]  P. Jaccard Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines , 1901 .

[23]  Domenico Talia,et al.  SECCO: On Building Semantic Links in Peer-to-Peer Networks , 2009, J. Data Semant..

[24]  Danushka Bollegala,et al.  WebSim: A Web-based Semantic Similarity Measure , 2007 .

[25]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[26]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[27]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[28]  Isabelle Bichindaritz,et al.  Concept mining for indexing medical literature , 2005, Eng. Appl. Artif. Intell..

[29]  A. Ochiai Zoogeographical Studies on the Soleoid Fishes Found in Japan and its Neighbouring Regions-III , 1957 .

[30]  K. S. Raghavan,et al.  Relationships in the Organization of Knowledge , 2001 .

[31]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[32]  Mark Stevenson,et al.  A Semantic Approach to IE Pattern Induction , 2005, ACL.

[33]  Edwin R. Hancock,et al.  Bayesian graph edit distance , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[34]  David Sánchez,et al.  Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective , 2011, J. Biomed. Informatics.

[35]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[36]  Yun Peng,et al.  Swoogle: A semantic web search and metadata engine , 2004, CIKM 2004.

[37]  David Sánchez,et al.  A methodology to learn ontological attributes from the Web , 2010, Data Knowl. Eng..

[38]  Ted Briscoe,et al.  32nd Annual Meeting of the Association for Computational Linguistics, 27-30 June 1994, New Mexico State University, Las Cruces, New Mexico, USA, Proceedings , 1994, ACL.

[39]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[40]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[41]  A. Tversky Features of Similarity , 1977 .

[42]  Hassan J. Eghbali,et al.  K-S Test for Detecting Changes from Landsat Imagery Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[43]  Thierry Poibeau,et al.  Content Annotation for the Semantic Web , 2005 .

[44]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[45]  Prashant Doshi,et al.  Inexact Matching of Ontology Graphs Using Expectation-Maximization , 2006, AAAI.

[46]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[47]  Giuseppe Pirrò,et al.  A semantic similarity metric combining features and intrinsic information content , 2009, Data Knowl. Eng..

[48]  Pericles A. Mitkas,et al.  SoFoCles: Feature filtering for microarray classification based on Gene Ontology , 2010, J. Biomed. Informatics.

[49]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[50]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[51]  David Sánchez,et al.  Automatic extraction of acronym definitions from the Web , 2011, Applied Intelligence.

[52]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[53]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[54]  A. Blank Words and Concepts in Time: towards Diachronic Cognitive Onomasiology , 2001 .

[55]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[56]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[57]  David Sánchez,et al.  An ontology-based measure to compute semantic similarity in biomedicine , 2011, J. Biomed. Informatics.

[58]  L. Ohno-Machado Journal of Biomedical Informatics , 2001 .

[59]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[60]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[61]  Euripides G. M. Petrakis,et al.  X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies , 2006, J. Digit. Inf. Manag..