A Novel Measure for Semantic Similarity Computation of Gene Ontology Terms Using Weighted Aggregation of Information Contents

Background: Gene ontology (GO) is a well-structured knowledge of biological terms that describes roles of genes and their products in a standardized and organized controlled vocabulary format. Over the last decade, many measures are developed to exploit GOadvantagestodeterminesemanticsimilaritiesbetweenbiologicalentities. UsingGOontologies,therearesomeconstraintsthat existing GO-based semantic similarity measures try to address them. For instance, (1) edges in a GO graph, do not indicate uniform distances and also have different densities, and (2) ignoring term levels in an ontology makes “shallow annotation” drawback, i.e., two terms with a certain distance near the root of GO graph have equal semantic similarity with two terms with the same distance but far from the root. Methods: Here, we present wAIC, a two-stage hybrid semantic similarity measure using weighted aggregation of information contents. In wAIC, the impact of each common ancestor on semantic similarity value is determined according to the location of the ancestor in the ontology graph. wAIC, also, filters (from annotating term set) terms that are in upper levels of the graph ontology to reduce shallow annotation constraints. Results: Experimental results confirm that the proposed measure is more consistent with major related constraints, such that, wAIC semantic similarity values have more correlation with both sequence similarity values and gene expression based similarity values than state-of-the-art semantic similarity measures. Conclusions: WAICshowusingaweightedaggregationof commonancestorsiscompletelyconsistentwiththehumanperception and can improve accuracy of gene similarity measurement.

[1]  Danoosh Davoodi,et al.  Gene Functional Similarity Analysis by Definition-based Semantic Similarity Measurement of GO Terms , 2014, Canadian Conference on AI.

[2]  Philip S. Yu,et al.  Measure the Semantic Similarity of GO Terms Using Aggregate Information Content , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  David Sánchez,et al.  A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain , 2014, J. Biomed. Informatics.

[4]  Chunyu Wang,et al.  A novel insight into Gene Ontology semantic similarity. , 2013, Genomics.

[5]  S. Crovella,et al.  Database tools in genetic diseases research. , 2013, Genomics.

[6]  Haixuan Yang,et al.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty , 2012, Bioinform..

[7]  Mário J. Silva,et al.  Disjunctive shared information between ontology concepts: application to Gene Ontology , 2011, J. Biomed. Semant..

[8]  Gary D. Bader,et al.  An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology , 2010, BMC Bioinformatics.

[9]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[10]  James Zijun Wang,et al.  Effectively Integrating Information Content and Structural Relationship to Improve the GO-based Similarity Measure Between Proteins , 2010, BIOCOMP.

[11]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[12]  Safaai Deris,et al.  A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences , 2008, J. Biomed. Informatics.

[13]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[14]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[15]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[16]  Trupti Joshi,et al.  Quantitative assessment of relationship between sequence similarity and function similarity , 2007, BMC Genomics.

[17]  Mário J. Silva,et al.  Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors , 2005, CIKM '05.

[18]  Angel Rubio,et al.  Correlation between gene expression and GO semantic similarity , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Michael A. Siani-Rose,et al.  A Knowledge-Based Clustering Algorithm Driven by Gene Ontology , 2004, Journal of biopharmaceutical statistics.

[20]  Olivier Bodenreider,et al.  Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[21]  Pedro M. Coutinho,et al.  Implementation of a Functional Semantic Similarity Measure between Gene-Products , 2003 .

[22]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[24]  Goran Nenadic,et al.  Terminology-driven mining of biomedical literature , 2003, SAC '03.

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[27]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[28]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[29]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[30]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .