A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain

Ontologies are widely adopted in the biomedical domain to characterize various resources (e.g. diseases, drugs, scientific publications) with non-ambiguous meanings. By exploiting the structured knowledge that ontologies provide, a plethora of ad hoc and domain-specific semantic similarity measures have been defined over the last years. Nevertheless, some critical questions remain: which measure should be defined/chosen for a concrete application? Are some of the, a priori different, measures indeed equivalent? In order to bring some light to these questions, we perform an in-depth analysis of existing ontology-based measures to identify the core elements of semantic similarity assessment. As a result, this paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases. By demonstrating that groups of measures are just particular instantiations of parameterized functions, we unify a large number of state-of-the-art semantic similarity measures through common expressions. The application of the proposed framework and its practical usefulness is underlined by an empirical analysis of hundreds of semantic measures in a biomedical context.

[1]  Euripides G. M. Petrakis,et al.  Information Retrieval by Semantic Similarity , 2006, Int. J. Semantic Web Inf. Syst..

[2]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[3]  R. Sharan,et al.  PREDICT: a method for inferring novel drug indications with application to personalized medicine , 2011, Molecular systems biology.

[4]  Valerie V. Cross,et al.  Unifying ontological similarity measures: A theoretical and empirical investigation , 2013, Int. J. Approx. Reason..

[5]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[6]  Mounira Harzallah,et al.  A Typology Of Ontology-Based Semantic Measures , 2005, EMOI-INTEROP.

[7]  Valerie V. Cross,et al.  A fuzzy set framework for ontological similarity measures , 2010, International Conference on Fuzzy Systems.

[8]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[9]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[10]  Changhui Yan,et al.  A Graph-Based Semantic Similarity Measure for the gene Ontology , 2011, J. Bioinform. Comput. Biol..

[11]  Safaai Deris,et al.  A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences , 2008, J. Biomed. Informatics.

[12]  Sylvie Ranwez,et al.  User centered and ontology based information retrieval system for life sciences , 2010, BMC Bioinformatics.

[13]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[14]  V. Cross,et al.  Tversky's Parameterized Similarity Ratio Model: A Basis for Semantic Relatedness , 2006, NAFIPS 2006 - 2006 Annual Meeting of the North American Fuzzy Information Processing Society.

[15]  Kent A Spackman,et al.  SNOMED CT milestones: endorsements are added to already-impressive standards credentials. , 2004, Healthcare informatics : the business magazine for information and communication systems.

[16]  C. Tappert,et al.  A Survey of Binary Similarity and Distance Measures , 2010 .

[17]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[18]  Thomas C. Rindflesch,et al.  Effects of information and machine learning algorithms on word sense disambiguation with small datasets , 2005, Int. J. Medical Informatics.

[19]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[20]  Haixuan Yang,et al.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty , 2012, Bioinform..

[21]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[22]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[23]  Jérôme Euzenat,et al.  A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness , 2010, SEMWEB.

[24]  Euripides G. M. Petrakis,et al.  X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies , 2006, J. Digit. Inf. Manag..

[25]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[26]  Sylvie Ranwez,et al.  Ontological Distance Measures for Information Visualisation on Conceptual Maps , 2006, OTM Workshops.

[27]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[28]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[29]  Sylvie Ranwez,et al.  The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies , 2014, Bioinform..

[30]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[31]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[32]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[33]  Hoa A. Nguyen,et al.  A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[34]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[35]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[36]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[37]  V. Cross Fuzzy semantic distance measures between ontological concepts , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..

[38]  Pedro M. Coutinho,et al.  Implementation of a Functional Semantic Similarity Measure between Gene-Products , 2003 .

[39]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[40]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[41]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[42]  Mário J. Silva,et al.  Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors , 2005, CIKM '05.

[43]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[44]  David Sánchez,et al.  Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective , 2011, J. Biomed. Informatics.

[45]  Steffen Staab,et al.  Taxonomy Learning - Factoring the Structure of a Taxonomy into a Semantic Classification Decision , 2002, COLING.

[46]  Steffen Staab,et al.  Comparing ontologies - similarity measures and a comparison study , 2001 .

[47]  Mário J. Silva,et al.  Disjunctive shared information between ontology concepts: application to Gene Ontology , 2011, J. Biomed. Semant..

[48]  Mounira Harzallah,et al.  A generic framework for comparing semantic similarities on a subsumption hierarchy , 2008, ECAI.

[49]  David Sánchez,et al.  An ontology-based measure to compute semantic similarity in biomedicine , 2011, J. Biomed. Informatics.

[50]  Hisham Al-Mubaid,et al.  A New Path Length Measure Based on GO for Gene Similarity with Evaluation using SGD Pathways , 2008, 2008 21st IEEE International Symposium on Computer-Based Medical Systems.

[51]  L. Ohno-Machado Journal of Biomedical Informatics , 2001 .

[52]  Mario Cannataro,et al.  Semantic similarity analysis of protein data: assessment with biological features and issues , 2012, Briefings Bioinform..

[53]  Khaled Mellouli,et al.  A New Similarity Measure Based On Edge Counting , 2008 .

[54]  Nicola Mulder,et al.  DaGO-Fun: tool for gene ontology-based functional analysis enhanced through semantic similarity measures , 2013 .

[55]  Wesley W. Chu,et al.  Free-text medical document retrieval via phrase-based vector space model , 2002, AMIA.

[56]  Ted Pedersen,et al.  Towards a framework for developing semantic relatedness reference standards , 2011, J. Biomed. Informatics.

[57]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[58]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[59]  A. Tversky Features of Similarity , 1977 .