Functional coherence metrics in protein families

BackgroundBiological sequences, such as proteins, have been provided with annotations that assign functional information. These functional annotations are associations of proteins (or other biological sequences) with descriptors characterizing their biological roles. However, not all proteins are fully (or even at all) annotated. This annotation incompleteness limits our ability to make sound assertions about the functional coherence within sets of proteins. Annotation incompleteness is a problematic issue when measuring semantic functional similarity of biological sequences since they can only capture a limited amount of all the semantic aspects the sequences may encompass.MethodsInstead of relying uniquely on single (reductive) metrics, this work proposes a comprehensive approach for assessing functional coherence within protein sets. The approach entails using visualization and term enrichment techniques anchored in specific domain knowledge, such as a protein family. For that purpose we evaluate two novel functional coherence metrics, mUI and mGIC that combine aspects of semantic similarity measures and term enrichment.ResultsThese metrics were used to effectively capture and measure the local similarity cores within protein sets. Hence, these metrics coupled with visualization tools allow an improved grasp on three important functional annotation aspects: completeness, agreement and coherence.ConclusionsMeasuring the functional similarity between proteins based on their annotations is a non trivial task. Several metrics exist but due both to characteristics intrinsic to the nature of graphs and extrinsic natures related to the process of annotation each measure can only capture certain functional annotation aspects of proteins. Hence, when trying to measure the functional coherence of a set of proteins a single metric is too reductive. Therefore, it is valuable to be aware of how each employed similarity metric works and what similarity aspects it can best capture. Here we test the behaviour and resilience of some similarity metrics.

[1]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[2]  Russ B. Altman,et al.  A literature-based method for assessing the functional coherence of a gene group , 2003, Bioinform..

[3]  Predrag Radivojac,et al.  Information-theoretic evaluation of predicted ontological annotations , 2013, Bioinform..

[4]  W. Ye,et al.  Correction: An Aptamer-Based Biosensor for Colorimetric Detection of Escherichia coli O157:H7 , 2013, PLoS ONE.

[5]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[6]  Knut Reinert,et al.  Robust consensus computation , 2008, BMC Bioinformatics.

[7]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[8]  Xinghua Lu,et al.  Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph , 2010, Bioinform..

[9]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[10]  Luay Nakhleh,et al.  GS2: an efficiently computable measure of GO-based similarity of gene sets , 2009, Bioinform..

[11]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[12]  Rui Jiang,et al.  From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity , 2013, TheScientificWorldJournal.

[13]  Francisco M. Couto,et al.  GRYFUN: A Web Application for GO Term Annotation Visualization and Analysis in Protein Sets , 2015, PloS one.

[14]  Michael W. Berry,et al.  Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts , 2011, PloS one.

[15]  B. Palsson,et al.  Towards multidimensional genome annotation , 2006, Nature Reviews Genetics.

[16]  Predrag Radivojac,et al.  The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective , 2014, Bioinform..

[17]  Francisco M. Couto,et al.  Annotation extension through protein family annotation coherence metrics , 2013, Front. Genet..

[18]  R. Gentleman,et al.  Visualizing and Distances Using GO , 2006 .

[19]  Youngik Yang,et al.  Annotation confidence score for genome annotation: a genome comparison approach , 2010, Bioinform..

[20]  Dongrong Xu,et al.  Association of Cerebral Networks in Resting State with Sexual Preference of Homosexual Men: A Study of Regional Homogeneity and Functional Connectivity , 2013, PloS one.

[21]  Marek Sikora,et al.  RuleGO: a logical rules-based tool for description of gene groups by means of Gene Ontology , 2011, Nucleic Acids Res..

[22]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[23]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[24]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[25]  Jesús S. Aguilar-Ruiz,et al.  GO-based Functional Dissimilarity of Gene Sets , 2011, BMC Bioinformatics.

[26]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[27]  Xiaomei Wu,et al.  Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method , 2013, PloS one.