A New Path Based Hybrid Measure for Gene Ontology Similarity

Gene Ontology (GO) consists of a controlled vocabulary of terms, annotating a gene or gene product, structured in a directed acyclic graph. In the graph, semantic relations connect the terms, that represent the knowledge of functional description and cellular component information of gene products. GO similarity gives us a numerical representation of biological relationship between a gene set, which can be used to infer various biological facts such as protein interaction, structural similarity, gene clustering, etc. Here we introduce a new shortest path based hybrid measure of ontological similarity between two terms which combines both structure of the GO graph and information content of the terms. Here the similarity between two terms t1 and t2, referred to as GOSimPBHM(t1,t2), has two components; one obtained from the common ancestors of t1 and t2. The other from their remaining ancestors. The proposed path based hybrid measure does not suffer from the well-known shallow annotation problem. Its superiority with respect to some other popular measures is established for protein protein interaction prediction, correlation with gene expression and functional classification of genes in a biological pathway. Finally, the proposed measure is utilized to compute the average GO similarity score among the genes that are experimentally validated targets of some microRNAs. Results demonstrate that the targets of a given miRNA have a high degree of similarity in the biological process category of GO.

[1]  Teresa M. Przytycka,et al.  DOMINE: a database of protein domain interactions , 2007, Nucleic Acids Res..

[2]  Hisham Al-Mubaid,et al.  A New Path Length Measure Based on GO for Gene Similarity with Evaluation using SGD Pathways , 2008, 2008 21st IEEE International Symposium on Computer-Based Medical Systems.

[3]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[4]  Xiaomei Wu,et al.  Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations , 2006, Nucleic acids research.

[5]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[6]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[7]  Sidahmed Benabderrahmane,et al.  IntelliGO: a new vector-based semantic similarity measure including annotation origin , 2010, BMC Bioinformatics.

[8]  James Zijun Wang,et al.  Effectively Integrating Information Content and Structural Relationship to Improve the GO-based Similarity Measure Between Proteins , 2010, BIOCOMP.

[9]  Mark A. Ragan,et al.  Gene Ontology-driven inference of protein-protein interactions using inducers , 2011 .

[10]  Gary D. Bader,et al.  An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology , 2010, BMC Bioinformatics.

[11]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[12]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[13]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[14]  Delphine Pessoa,et al.  CESSM: collaborative evaluation of semantic similarity measures , 2009 .

[15]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[16]  Christophe Dessimoz,et al.  The what, where, how and why of gene ontology—a primer for bioinformaticians , 2011, Briefings Bioinform..

[17]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[18]  Hau-San Wong,et al.  A new method for measuring the semantic similarity on gene ontology , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[19]  Mario Cannataro,et al.  Semantic similarity analysis of protein data: assessment with biological features and issues , 2012, Briefings Bioinform..

[20]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[21]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[22]  Yangchao Huang,et al.  Simple sequence-based kernels do not predict protein-protein interactions , 2010, Bioinform..

[23]  PagelPhilipp,et al.  The MIPS mammalian protein--protein interaction database , 2005 .

[24]  Xiaomei Wu,et al.  Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method , 2013, PloS one.

[25]  Lothar Reichel,et al.  The relationship between protein sequences and their gene ontology functions , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[26]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[27]  R. Gentleman,et al.  Visualizing and Distances Using GO , 2006 .

[28]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[29]  Carsten Wiuf,et al.  Co-clustering and visualization of gene expression data and gene ontology terms for Saccharomyces cerevisiae using self-organizing maps , 2007, J. Biomed. Informatics.

[30]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[32]  Yan Zhou,et al.  Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data , 2008, BMC Bioinformatics.

[33]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[34]  Qing-Yu He,et al.  A new method for measuring functional similarity of microRNAs , 2011 .

[35]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[36]  Thomas Lengauer,et al.  Computational analysis of human protein interaction networks , 2007, Proteomics.

[37]  Chi-Ying F. Huang,et al.  miRTarBase: a database curates experimentally validated microRNA–target interactions , 2010, Nucleic Acids Res..

[38]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[39]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[40]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms , 2004, Nucleic Acids Res..

[41]  Georgios Papachristoudis,et al.  GOmir: A stand-alone application for human microRNA target analysis and gene ontology clustering , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.