CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations

The set of annotations at the Saccharomyces Genome Database (SGD) that classifies the cellular function of S. cerevisiae gene products using Gene Ontology (GO) terms has become an important resource for facilitating experimental analysis. In addition to capturing and summarizing experimental results, the structured nature of GO annotations allows for functional comparison across organisms as well as propagation of functional predictions between related gene products. Due to their relevance to many areas of research, ensuring the accuracy and quality of these annotations is a priority at SGD. GO annotations are assigned either manually, by biocurators extracting experimental evidence from the scientific literature, or through automated methods that leverage computational algorithms to predict functional information. Here, we discuss the relationship between literature-based and computationally predicted GO annotations in SGD and extend a strategy whereby comparison of these two types of annotation identifies genes whose annotations need review. Our method, CvManGO (Computational versus Manual GO annotations), pairs literature-based GO annotations with computational GO predictions and evaluates the relationship of the two terms within GO, looking for instances of discrepancy. We found that this method will identify genes that require annotation updates, taking an important step towards finding ways to prioritize literature review. Additionally, we explored factors that may influence the effectiveness of CvManGO in identifying relevant gene targets to find in particular those genes that are missing literature-supported annotations, but our survey found that there are no immediately identifiable criteria by which one could enrich for these under-annotated genes. Finally, we discuss possible ways to improve this strategy, and the applicability of this method to other projects that use the GO for curation. Database URL: http://www.yeastgenome.org

[1]  Olga G. Troyanskaya,et al.  Assessing the functional structure of genomic data , 2008, ISMB.

[2]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[3]  J. Michael Cherry,et al.  Using computational predictions to improve literature-based Gene Ontology annotations: a feasibility study , 2011, Database J. Biol. Databases Curation.

[4]  H. Drabkin,et al.  A MOD(ern) perspective on literature curation , 2010, Molecular Genetics and Genomics.

[5]  Matthew A. Hibbs,et al.  Discovery of biological networks from diverse functional genomic data , 2005, Genome Biology.

[6]  Randi Vita,et al.  The Biocurator: Connecting and Enhancing Scientific Data , 2006, PLoS Comput. Biol..

[7]  A. James 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[8]  Rolf Apweiler,et al.  The Gene Ontology Annotation (GOA) Project—Application of GO in SWISS-PROT, TrEMBL and InterPro , 2003, Comparative and functional genomics.

[9]  W. John MacMullen,et al.  Contextual analysis of variation and quality in human-curated gene ontology annotations , 2007 .

[10]  Suzanna Lewis,et al.  Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium , 2011, Briefings Bioinform..

[11]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[12]  Kimberly Van Auken,et al.  Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation , 2009, BMC Bioinformatics.

[13]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[14]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[15]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[16]  Bohdan Schneider,et al.  A Biocurator Perspective: Annotation at the Research Collaboratory for Structural Bioinformatics Protein Data Bank , 2006, PLoS Comput. Biol..

[17]  Weidong Tian,et al.  Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function , 2008, Genome Biology.

[18]  Emily Dimmer,et al.  An evaluation of GO annotation retrieval for BioCreAtIvE and GOA , 2005, BMC Bioinformatics.

[19]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[20]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[21]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[22]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[23]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[24]  Asa Ben-Hur,et al.  The use of gene ontology evidence codes in preventing classifier assessment bias , 2009, Bioinform..

[25]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.

[26]  Karen R Christie,et al.  Functional annotations for the Saccharomyces cerevisiae genome: the knowns and the known unknowns. , 2009, Trends in microbiology.

[27]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[28]  Michael R. Thon,et al.  PoGO: Prediction of Gene Ontology terms for fungal proteins , 2010, BMC Bioinformatics.

[29]  J. Michael Cherry,et al.  Mining experimental evidence of molecular function claims from the literature , 2007, Bioinform..

[30]  P. Kersey,et al.  In Silico Characterization of Proteins: UniProt, InterPro and Integr8 , 2008, Molecular biotechnology.

[31]  Edith D. Wong,et al.  New mutant phenotype data curation system in the Saccharomyces Genome Database , 2009, Database J. Biol. Databases Curation.