The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species

The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content.

[1]  Kara Dolinski,et al.  The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists , 2007, PloS one.

[2]  Huaiyu Mi,et al.  Ontology annotation: mapping genomic regions to biological function. , 2007, Current opinion in chemical biology.

[3]  Nan Guo,et al.  PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways , 2006, Nucleic Acids Res..

[4]  Erik L. L. Sonnhammer,et al.  InParanoid 6: eukaryotic ortholog clusters with inparalogs , 2007, Nucleic Acids Res..

[5]  D. Botstein,et al.  Orthology and functional conservation in eukaryotes. , 2007, Annual review of genetics.

[6]  Philip E. Bourne,et al.  Biocurators: Contributors to the World of Science , 2006, PLoS Comput. Biol..

[7]  Jürg Bähler,et al.  YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms , 2006, Nucleic Acids Res..

[8]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[9]  Miguel A. Andrade-Navarro,et al.  Evaluation of annotation strategies using an entire genome sequence , 2003, Bioinform..

[10]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.

[11]  Temple F. Smith,et al.  The challenges of genome sequence annotation or “The devil is in the details” , 1997, Nature Biotechnology.

[12]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[13]  Giorgio Valle,et al.  The Gene Ontology project in 2008 , 2007, Nucleic Acids Res..

[14]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[15]  Mikhail S. Gelfand,et al.  Mining sequence annotation databanks for association patterns , 2005, Bioinform..

[16]  Li Ni,et al.  A procedure for assessing GO annotation consistency , 2005, ISMB.

[17]  Andrey Alexeyenko,et al.  Overview and comparison of ortholog databases. , 2006, Drug discovery today. Technologies.

[18]  Emily Dimmer,et al.  An evaluation of GO annotation retrieval for BioCreAtIvE and GOA , 2005, BMC Bioinformatics.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[21]  R. F. Smith,et al.  Perspectives: sequence data base searching in the era of large-scale genomic sequencing. , 1996, Genome research.