A guide to best practices for Gene Ontology (GO) manual annotation

The Gene Ontology Consortium (GOC) is a community-based bioinformatics project that classifies gene product function through the use of structured controlled vocabularies. A fundamental application of the Gene Ontology (GO) is in the creation of gene product annotations, evidence-based associations between GO definitions and experimental or sequence-based analysis. Currently, the GOC disseminates 126 million annotations covering >374 000 species including all the kingdoms of life. This number includes two classes of GO annotations: those created manually by experienced biocurators reviewing the literature or by examination of biological data (1.1 million annotations covering 2226 species) and those generated computationally via automated methods. As manual annotations are often used to propagate functional predictions between related proteins within and between genomes, it is critical to provide accurate consistent manual annotations. Toward this goal, we present here the conventions defined by the GOC for the creation of manual annotation. This guide represents the best practices for manual annotation as established by the GOC project over the past 12 years. We hope this guide will encourage research communities to annotate gene products of their interest to enhance the corpus of GO annotations available to all. Database URL: http://www.geneontology.org

[1]  Kimberly Van Auken,et al.  WormBase 2012: more genomes, more data, new website , 2011, Nucleic Acids Res..

[2]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[3]  Monte Westerfield,et al.  ZFIN: enhancements and updates to the zebrafish model organism database , 2010, Nucleic Acids Res..

[4]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[5]  Karen Eilbeck,et al.  Evolution of the Sequence Ontology terms and relationships , 2009, J. Biomed. Informatics.

[6]  Johannes Buchner,et al.  Hsp12 is an intrinsically unstructured stress protein that folds upon membrane association and modulates membrane function. , 2010, Molecular cell.

[7]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[8]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[9]  Rachael P. Huntley,et al.  QuickGO: a web-based tool for Gene Ontology searching , 2009, Bioinform..

[10]  Zhiyong Lu,et al.  BioCreative-2012 Virtual Issue , 2012, Database J. Biol. Databases Curation.

[11]  Judith A. Blake,et al.  Manual Gene Ontology annotation workflow at the Mouse Genome Informatics Database , 2012, Database J. Biol. Databases Curation.

[12]  E. Camon,et al.  The Impact of Focused Gene Ontology Curation of Specific Mammalian Systems , 2011, PloS one.

[13]  Jürg Bähler,et al.  PomBase: a comprehensive online resource for fission yeast , 2011, Nucleic Acids Res..

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Prudence Mutowo-Meullenet,et al.  Use of Gene Ontology Annotation to understand the peroxisome proteome in humans , 2013, Database J. Biol. Databases Curation.

[16]  F. Kaudewitz,et al.  The identification of apocytochrome b as a mitochondrial gene product and immunological evidence for altered apocytochrome b in yeast strains having mutations in the COB region of mitochondrial DNA. , 1979, European journal of biochemistry.

[17]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[18]  A. Ford-hutchinson,et al.  Molecular Cloning and Functional Expression of aCaenorhabditis elegans Aminopeptidase Structurally Related to Mammalian Leukotriene A4 Hydrolases* , 1998, The Journal of Biological Chemistry.

[19]  María Martín,et al.  Ongoing and future developments at the Universal Protein Resource , 2010, Nucleic Acids Res..

[20]  W. John MacMullen Quantifying literature citations, index terms, and Gene Ontology annotations in the Saccharomyces Genome Database to assess results-set clustering utility , 2006, ASIST.

[21]  Ling Lin,et al.  Axonal Growth Regulation of Fetal and Embryonic Stem Cell‐Derived Dopaminergic Neurons by Netrin‐1 and Slits , 2006, Stem cells.

[22]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[23]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[24]  Tanya Z Berardini,et al.  The representation of heart development in the gene ontology. , 2011, Developmental biology.

[25]  Jim Thurmond,et al.  FlyBase 101 – the basics of navigating FlyBase , 2011, Nucleic Acids Res..

[26]  M. Tyers,et al.  Fission yeast Clp1p phosphatase regulates G2/M transition and coordination of cytokinesis with cell cycle progression , 2001, Current Biology.

[27]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[28]  Thomas Voets,et al.  CAPS1 Regulates Catecholamine Loading of Large Dense-Core Vesicles , 2005, Neuron.

[29]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[30]  Marek S. Skrzypek,et al.  Improved Gene Ontology Annotation for Biofilm Formation, Filamentous Growth, and Phenotypic Switching in Candida albicans , 2012, Eukaryotic Cell.

[31]  Ni Li,et al.  Gene Ontology Annotations and Resources , 2012, Nucleic Acids Res..