Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery

Gene Ontology (GO) semantic similarity tools enable retrieval of semantic similarity scores, which incorporate biological knowledge embedded in the GO structure for comparing or classifying different proteins or list of proteins based on their GO annotations. This facilitates a better understanding of biological phenomena underlying the corresponding experiment and enables the identification of processes pertinent to different biological conditions. Currently, about 14 tools are available, which may play an important role in improving protein analyses at the functional level using different GO semantic similarity measures. Here we survey these tools to provide a comprehensive view of the challenges and advances made in this area to avoid redundant effort in developing features that already exist, or implementing ideas already proven to be obsolete in the context of GO. This helps researchers, tool developers, as well as end users, understand the underlying semantic similarity measures implemented through knowledge of pertinent features of, and issues related to, a particular tool. This should empower users to make appropriate choices for their biological applications and ensure effective knowledge discovery based on GO annotations.

[1]  Mario Albrecht,et al.  FunSimMat: a comprehensive functional similarity database , 2007, Nucleic Acids Res..

[2]  Sidahmed Benabderrahmane,et al.  IntelliGO: a new vector-based semantic similarity measure including annotation origin , 2010, BMC Bioinformatics.

[3]  Hoa A. Nguyen,et al.  A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  Safaai Deris,et al.  UTMGO: A Tool for Searching a Group of Semantically Related Gene Ontology Terms and Application to Annotation of Anonymous Protein Sequence , 2007 .

[5]  Dong Liu,et al.  Inferring plant microRNA functional similarity using a weighted protein-protein interaction network , 2015, BMC Bioinformatics.

[6]  Mario Cannataro,et al.  Semantic similarity analysis of protein data: assessment with biological features and issues , 2012, Briefings Bioinform..

[7]  Kyungsook Han,et al.  Assessing protein-protein interactions based on the semantic similarity of interacting proteins , 2015, Int. J. Data Min. Bioinform..

[8]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[9]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[10]  Valerie V. Cross,et al.  Fuzzy set and semantic similarity in ontology alignment , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[11]  Nicola J. Mulder,et al.  A Topology-Based Metric for Measuring Term Similarity in the Gene Ontology , 2012, Adv. Bioinformatics.

[12]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[13]  Hau-San Wong,et al.  A new method for measuring the semantic similarity on gene ontology , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[14]  F. Zare-Mirakabad,et al.  WCOACH: Protein complex prediction in weighted PPI networks. , 2015, Genes & genetic systems.

[15]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[16]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[17]  C. Diot,et al.  Optimal Threshold Determination for Interpreting Semantic Similarity and Particularity: Application to the Comparison of Gene Sets and Metabolic Pathways Using GO and ChEBI , 2015, PloS one.

[18]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[19]  Yan Zhou,et al.  Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data , 2008, BMC Bioinformatics.

[20]  Sanghamitra Bandyopadhyay,et al.  A new path based hybrid measure for gene ontology similarity , 2014, TCBB.

[21]  R. Gentleman,et al.  Visualizing and Distances Using GO , 2006 .

[22]  Prudence Mutowo-Meullenet,et al.  The GOA database: Gene Ontology annotation updates for 2015 , 2014, Nucleic Acids Res..

[23]  Rachael P. Huntley,et al.  The UniProt-GO Annotation database in 2011 , 2011, Nucleic Acids Res..

[24]  Nicola J. Mulder,et al.  DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures , 2013, BMC Bioinformatics.

[25]  Thomas Lengauer,et al.  Improving disease gene prioritization using the semantic similarity of Gene Ontology terms , 2010, Bioinform..

[26]  Guillermo Sapiro,et al.  A Theoretical and Computational Framework for Isometry Invariant Recognition of Point Cloud Data , 2005, Found. Comput. Math..

[27]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[28]  A. Tversky Features of Similarity , 1977 .

[29]  Charlotte M. Deane,et al.  Functionally guided alignment of protein interaction networks for module detection , 2009, Bioinform..

[30]  Sampsa Hautaniemi,et al.  Fast Gene Ontology based clustering for microarray experiments , 2008, BioData Mining.

[31]  Kenneth H. Buetow,et al.  Gene functional similarity search tool (GFSST) , 2006, BMC Bioinformatics.

[32]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[33]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[34]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[35]  Brian D. Peyser,et al.  Gene function prediction from congruent synthetic lethal interactions in yeast , 2005, Molecular systems biology.

[36]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[37]  Hongfang Liu,et al.  DynGO: a tool for visualizing and mining of Gene Ontology and its associations , 2005, BMC Bioinformatics.

[38]  Anita Burgun-Parenthoine,et al.  A transversal approach to predict gene product networks from ontology-based similarity , 2007, BMC Bioinformatics.

[39]  Chunyu Wang,et al.  A novel insight into Gene Ontology semantic similarity. , 2013, Genomics.

[40]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.

[41]  María Martín,et al.  The Gene Ontology: enhancements for 2011 , 2011, Nucleic Acids Res..

[42]  Aidong Zhang,et al.  Semantic integration to identify overlapping functional modules in protein interaction networks , 2007, BMC Bioinformatics.

[43]  Gary D. Bader,et al.  An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology , 2010, BMC Bioinformatics.

[44]  Guillermo Sapiro,et al.  A Gromov-Hausdorff Framework with Diffusion Geometry for Topologically-Robust Non-rigid Shape Matching , 2010, International Journal of Computer Vision.

[45]  Stan Matwin,et al.  simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes , 2015, Bioinform..

[46]  Holger Fröhlich,et al.  GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products , 2007, BMC Bioinformatics.

[47]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database - An integrated resource of GO annotations to the UniProt Knowledgebase , 2003, Silico Biol..

[48]  Masaki Aono,et al.  Metric of intrinsic information content for measuring semantic similarity in an ontology , 2010, APCCM.

[49]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[50]  Angel Rubio,et al.  Correlation between Gene Expression and GO Semantic Similarity , 2005, TCBB.

[51]  Junzhong Gu,et al.  A New Model of Information Content for Semantic Similarity in WordNet , 2008, 2008 Second International Conference on Future Generation Communication and Networking Symposia.

[52]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[53]  Philip S. Yu,et al.  G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery , 2009, Nucleic Acids Res..

[54]  H. Son,et al.  Categorizer: a tool to categorize genes into user-defined biological groups based on semantic similarity , 2014, BMC Genomics.

[55]  Paul Pavlidis,et al.  Gene Ontology term overlap as a measure of gene functional similarity , 2008, BMC Bioinformatics.

[56]  Mingxin Gan Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products , 2014, Comput. Math. Methods Medicine.

[57]  Mário J. Silva,et al.  Semantic similarity over the gene ontology: family correlation and selecting disjunctive ancestors , 2005, CIKM '05.

[58]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[59]  Luay Nakhleh,et al.  GS2: an efficiently computable measure of GO-based similarity of gene sets , 2009, Bioinform..

[60]  Hisham Al-Mubaid,et al.  Comparison of four similarity measures based on GO annotations for Gene Clustering , 2008, 2008 IEEE Symposium on Computers and Communications.

[61]  Olivier Bodenreider,et al.  Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[62]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[63]  Junzhong Gu,et al.  A New Model of Information Content Based on Concept ’ s Topology for Measuring Semantic Similarity in WordNet , 2012 .

[64]  Gilad Lerman,et al.  Defining functional distance using manifold embeddings of gene ontology annotations , 2007, Proceedings of the National Academy of Sciences.

[65]  Yang Yang,et al.  Missing value imputation for microRNA expression data by using a GO-based similarity measure , 2016, BMC Bioinformatics.

[66]  Nicola J. Mulder,et al.  Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory , 2013, BioMed research international.

[67]  Mario Albrecht,et al.  FunSimMat update: new features for exploring functional similarity , 2009, Nucleic Acids Res..

[68]  Nicola J. Mulder,et al.  Information Content-Based Gene Ontology Functional Similarity Measures: Which One to Use for a Given Biological Data Type? , 2014, PloS one.

[69]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[70]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[71]  Sylvie Ranwez,et al.  The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies , 2014, Bioinform..

[72]  Giorgio Valentini,et al.  GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology , 2014, Bioinform..

[73]  Jian-Huang Lai,et al.  Semantic similarity measurement between gene ontology terms based on exclusively inherited shared information. , 2015, Gene.

[74]  Philip S. Yu,et al.  Measure the Semantic Similarity of GO Terms Using Aggregate Information Content , 2013, ISBRA.

[75]  Nicola J. Mulder,et al.  Function Prediction and Analysis of Mycobacterium tuberculosis Hypothetical Proteins , 2012, International journal of molecular sciences.

[76]  Yadong Wang,et al.  Measuring semantic similarities by combining gene ontology annotations and gene co-function networks , 2015, BMC Bioinformatics.

[77]  Yang Dai,et al.  Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction , 2006, BMC Bioinformatics.

[78]  Catia Pesquita,et al.  ProteInOn: A Web Tool for Protein Semantic Similarity , 2007 .

[79]  Xiaojun Qi,et al.  A shortest-path graph kernel for estimating gene product semantic similarity , 2011, J. Biomed. Semant..

[80]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[81]  James Zijun Wang,et al.  Effectively Integrating Information Content and Structural Relationship to Improve the GO-based Similarity Measure Between Proteins , 2010, BIOCOMP.

[82]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[83]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[84]  Xue-wen Chen,et al.  A New Semantic Functional Similarity over Gene Ontology , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[85]  Alfonso Valencia,et al.  Defining functional distances over Gene Ontology , 2008, BMC Bioinformatics.

[86]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[87]  Yijia Zhang,et al.  Filtering Gene Ontology semantic similarity for identifying protein complexes in large protein interaction networks , 2012, Proteome Science.

[88]  Jing Zhu,et al.  Revealing and avoiding bias in semantic similarity scores for protein pairs , 2010, BMC Bioinformatics.

[89]  Nicola J. Mulder,et al.  A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool , 2016, Bioinform..

[90]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[91]  Nicola J. Mulder,et al.  The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines , 2014, Front. Genet..