Disjunctive shared information between ontology concepts: application to Gene Ontology

BackgroundThe large-scale effort in developing, maintaining and making biomedical ontologies available motivates the application of similarity measures to compare ontology concepts or, by extension, the entities described therein. A common approach, known as semantic similarity, compares ontology concepts through the information content they share in the ontology. However, different disjunctive ancestors in the ontology are frequently neglected, or not properly explored, by semantic similarity measures.ResultsThis paper proposes a novel method, dubbed DiShIn, that effectively exploits the multiple inheritance relationships present in many biomedical ontologies. DiShIn calculates the shared information content of two ontology concepts, based on the information content of the disjunctive common ancestors of the concepts being compared. DiShIn identifies these disjunctive ancestors through the number of distinct paths from the concepts to their common ancestors.ConclusionsDiShIn was applied to Gene Ontology and its performance was evaluated against state-of-the-art measures using CESSM, a publicly available evaluation platform of protein similarity measures. By modifying the way traditional semantic similarity measures calculate the shared information content, DiShIn was able to obtain a statistically significant higher correlation between semantic and sequence similarity. Moreover, the incorporation of DiShIn in existing applications that exploit multiple inheritance would reduce their execution time.

[1]  Olivier Bodenreider,et al.  Ontology-driven similarity approaches to supporting gene func- tional assessment , 2005 .

[2]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[4]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[5]  Delphine Pessoa,et al.  CESSM: collaborative evaluation of semantic similarity measures , 2009 .

[6]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[7]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[8]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[9]  Xiaojun Qi,et al.  A shortest-path graph kernel for estimating gene product semantic similarity , 2011, J. Biomed. Semant..

[10]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[11]  Monte Westerfield,et al.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation , 2009, PLoS biology.

[12]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[13]  João D. Ferreira,et al.  Semantic Similarity for Automatic Classification of Chemical Compounds , 2010, PLoS Comput. Biol..

[14]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[15]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[16]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[17]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[18]  Mário J. Silva,et al.  Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors , 2005, CIKM '05.

[19]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[20]  Sophia Ananiadou,et al.  Automatic Discovery of Term Similarities Using Pattern Mining , 2002, COLING-02 on COMPUTERM 2002 second international workshop on computational terminology -.

[21]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[22]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[23]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[24]  C.J.H. Mann Similarity and Compatibility in Fuzzy Set Theory – Assessment and Applications , 2002 .

[25]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[26]  Marc Ehrig,et al.  Similarity for Ontologies - A Comprehensive Framework , 2005, ECIS.

[27]  A. Tversky Features of Similarity , 1977 .

[28]  Andrea Splendiani,et al.  Biomedical semantics in the Semantic Web , 2011, J. Biomed. Semant..

[29]  A. Berger FUNDAMENTALS OF BIOSTATISTICS , 1969 .

[30]  Sampsa Hautaniemi,et al.  Fast Gene Ontology based clustering for microarray experiments , 2008, BioData Mining.

[31]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[32]  Sean Bechhofer,et al.  Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL , 2007, BMC Bioinformatics.