Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures.

Measuring the similarity between pairs of biological entities is important in molecular biology. The introduction of Gene Ontology (GO) provides us with a promising approach to quantifying the semantic similarity between two genes or gene products. This kind of similarity measure is closely associated with the GO terms annotated to biological entities under consideration and the structure of the GO graph. However, previous works in this field mainly focused on the upper part of the graph, and seldom concerned about the lower part. In this study, we aim to explore information from the lower part of the GO graph for better semantic similarity. We proposed a framework to quantify the similarity measure beneath a term pair, which takes into account both the information two ancestral terms share and the probability that they co-occur with their common descendants. The effectiveness of our approach was evaluated against seven typical measurements on public platform CESSM, protein-protein interaction and gene expression datasets. Experimental results consistently show that the similarity derived from the lower part contributes to better semantic similarity measure. The promising features of our approach are the following: (1) it provides a mirror model to characterize the information two ancestral terms share with respect to their common descendant; (2) it quantifies the probability that two terms co-occur with their common descendant in an efficient way; and (3) our framework can effectively capture the similarity measure beneath two terms, which can serve as an add-on to improve traditional semantic similarity measure between two GO terms. The algorithm was implemented in Matlab and is freely available from http://ejl.org.cn/bio/GOBeneath/.

[1]  Chunyu Wang,et al.  A novel insight into Gene Ontology semantic similarity. , 2013, Genomics.

[2]  Nicola J. Mulder,et al.  DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures , 2013, BMC Bioinformatics.

[3]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[4]  Janna Hastings,et al.  Exploiting disjointness axioms to improve semantic similarity measures , 2013, Bioinform..

[5]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[6]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[7]  Werner Ceusters,et al.  A novel view on information content of concepts in a large ontology and a view on the structure and the quality of the ontology , 2005, Int. J. Medical Informatics.

[8]  Trupti Joshi,et al.  Quantitative assessment of relationship between sequence similarity and function similarity , 2007, BMC Genomics.

[9]  Mário J. Silva,et al.  Disjunctive shared information between ontology concepts: application to Gene Ontology , 2011, J. Biomed. Semant..

[10]  Gary D. Bader,et al.  An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology , 2010, BMC Bioinformatics.

[11]  Daisuke Kihara,et al.  PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool , 2015, Bioinform..

[12]  Pierre Baldi,et al.  Deep autoencoder neural networks for gene ontology annotation predictions , 2014, BCB.

[13]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[14]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[15]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[16]  R. Gentleman,et al.  Visualizing and Distances Using GO , 2006 .

[17]  Marco Masseroli,et al.  Software Suite for Gene and Protein Annotation Prediction and Similarity Search , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.

[19]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[22]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[23]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[24]  A. Valencia,et al.  Intrinsic errors in genome annotation. , 2001, Trends in genetics : TIG.

[25]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[26]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[27]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[28]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[29]  Rui Jiang,et al.  From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity , 2013, TheScientificWorldJournal.

[30]  Hongfei Lin,et al.  Gene Function Prediction Based on the Gene Ontology Hierarchical Structure , 2014, PloS one.

[31]  -. M.C.Munoz,et al.  Gene Ontology Consortium: going forward , 2017 .

[32]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[33]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[34]  Laura Inés Furlong,et al.  Assessment of NER solutions against the first and second CALBC Silver Standard Corpus , 2011, Semantic Mining in Biomedicine.

[35]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[36]  Haixuan Yang,et al.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty , 2012, Bioinform..

[37]  Olivier Bodenreider,et al.  Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[38]  Sanghamitra Bandyopadhyay,et al.  A new path based hybrid measure for gene ontology similarity , 2014, TCBB.

[39]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[40]  Jian-Huang Lai,et al.  Semantic similarity measurement between gene ontology terms based on exclusively inherited shared information. , 2015, Gene.

[41]  Marco Masseroli,et al.  Ontology-Based Prediction and Prioritization of Gene Functional Annotations , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  Ju Han Kim,et al.  Bi-directional semantic similarity for gene ontology to optimize biological and clinical analyses , 2012, J. Am. Medical Informatics Assoc..

[43]  Xiaomei Wu,et al.  Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method , 2013, PloS one.

[44]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[45]  Angel Rubio,et al.  Correlation between Gene Expression and GO Semantic Similarity , 2005, TCBB.