Semantic similarity analysis of protein data: assessment with biological features and issues

The integration of proteomics data with biological knowledge is a recent trend in bioinformatics. A lot of biological information is available and is spread on different sources and encoded in different ontologies (e.g. Gene Ontology). Annotating existing protein data with biological information may enable the use (and the development) of algorithms that use biological ontologies as framework to mine annotated data. Recently many methodologies and algorithms that use ontologies to extract knowledge from data, as well as to analyse ontologies themselves have been proposed and applied to other fields. Conversely, the use of such annotations for the analysis of protein data is a relatively novel research area that is currently becoming more and more central in research. Existing approaches span from the definition of the similarity among genes and proteins on the basis of the annotating terms, to the definition of novel algorithms that use such similarities for mining protein data on a proteome-wide scale. This work, after the definition of main concept of such analysis, presents a systematic discussion and comparison of main approaches. Finally, remaining challenges, as well as possible future directions of research are presented.

[1]  Mario Albrecht,et al.  FunSimMat update: new features for exploring functional similarity , 2009, Nucleic Acids Res..

[2]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[3]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[6]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[7]  Pedro M. Coutinho,et al.  Implementation of a Functional Semantic Similarity Measure between Gene-Products , 2003 .

[8]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[9]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[10]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[11]  Luay Nakhleh,et al.  GS2: an efficiently computable measure of GO-based similarity of gene sets , 2009, Bioinform..

[12]  Aaron J. Quigley,et al.  A relation based measure of semantic similarity for Gene Ontology annotations , 2008, BMC Bioinformatics.

[13]  Haiyuan Yu,et al.  Developing a similarity measure in biological function space , 2007 .

[14]  Zhenran Jiang,et al.  GO(vis), a gene ontology visualization tool based on multi-dimensional values. , 2010, Protein and peptide letters.

[15]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[16]  M. Gregory,et al.  Combining Hierarchical and Associative Gene Ontology Relations With Textual Evidence in Estimating Gene and Gene Product Similarity , 2007, IEEE Transactions on NanoBioscience.

[17]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[18]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[19]  Kenneth Baclawski,et al.  Ontologies for Bioinformatics (Computational Molecular Biology) , 2005 .

[20]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[21]  Gary D. Bader,et al.  An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology , 2010, BMC Bioinformatics.

[22]  Sidahmed Benabderrahmane,et al.  IntelliGO: a new vector-based semantic similarity measure including annotation origin , 2010, BMC Bioinformatics.

[23]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[24]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[25]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[26]  Yan Zhou,et al.  Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data , 2008, BMC Bioinformatics.

[27]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[28]  Olivier Bodenreider,et al.  Ontology-driven similarity approaches to supporting gene func- tional assessment , 2005 .

[29]  Aidong Zhang,et al.  Semantic integration to identify overlapping functional modules in protein interaction networks , 2007, BMC Bioinformatics.

[30]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[31]  Xiaomei Wu,et al.  Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations , 2006, Nucleic acids research.

[32]  Brian D. Peyser,et al.  Gene function prediction from congruent synthetic lethal interactions in yeast , 2005, Molecular systems biology.

[33]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.

[34]  Zheng Guo,et al.  Broadly predicting specific gene functions with expression similarity and taxonomy similarity. , 2005, Gene.

[35]  Hisham Al-Mubaid,et al.  Comparison of four similarity measures based on GO annotations for Gene Clustering , 2008, 2008 IEEE Symposium on Computers and Communications.

[36]  Olivier Bodenreider,et al.  Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[37]  James M. Keller,et al.  Fuzzy Measures on the Gene Ontology for Gene Product Similarity , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Delphine Pessoa,et al.  CESSM: collaborative evaluation of semantic similarity measures , 2009 .

[39]  Angel Rubio,et al.  Correlation between Gene Expression and GO Semantic Similarity , 2005, TCBB.

[40]  Safaai Deris,et al.  A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences , 2008, J. Biomed. Informatics.

[41]  Christophe Dessimoz,et al.  The what, where, how and why of gene ontology—a primer for bioinformaticians , 2011, Briefings Bioinform..

[42]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[43]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[44]  Charlotte M. Deane,et al.  Functionally guided alignment of protein interaction networks for module detection , 2009, Bioinform..

[45]  Sampsa Hautaniemi,et al.  Fast Gene Ontology based clustering for microarray experiments , 2008, BioData Mining.

[46]  Catia Pesquita,et al.  Measuring coherence between electronic and manual annotations in biological databases , 2009, SAC '09.

[47]  Olivier Bodenreider,et al.  Non-Lexical Approaches to Identifying Associative Relations in the Gene Ontology , 2004, Pacific Symposium on Biocomputing.

[48]  Gang Chen,et al.  GO Semantic Similarity Based Analysis for Huaman Protein Interactions , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[49]  Alfonso Valencia,et al.  Defining functional distances over Gene Ontology , 2008, BMC Bioinformatics.

[50]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[51]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[52]  Jing Zhu,et al.  Revealing and avoiding bias in semantic similarity scores for protein pairs , 2010, BMC Bioinformatics.

[53]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[54]  A. I. Archakov,et al.  Technologies of protein interactomics: A review , 2011, Russian Journal of Bioorganic Chemistry.

[55]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[56]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[57]  Anita Burgun-Parenthoine,et al.  A transversal approach to predict gene product networks from ontology-based similarity , 2007, BMC Bioinformatics.

[58]  James Zijun Wang,et al.  Effectively Integrating Information Content and Structural Relationship to Improve the GO-based Similarity Measure Between Proteins , 2010, BIOCOMP.

[59]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.

[60]  Paul Pavlidis,et al.  Gene Ontology term overlap as a measure of gene functional similarity , 2008, BMC Bioinformatics.

[61]  Steffen Staab,et al.  Taxonomy Learning - Factoring the Structure of a Taxonomy into a Semantic Classification Decision , 2002, COLING.