Use and misuse of the gene ontology annotations

The Gene Ontology (GO) project is a collaboration among model organism databases to describe gene products from all organisms using a consistent and computable language. GO produces sets of explicitly defined, structured vocabularies that describe biological processes, molecular functions and cellular components of gene products in both a computer- and human-readable manner. Here we describe key aspects of GO, which, when overlooked, can cause erroneous results, and address how these pitfalls can be avoided.

[1]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Yixin Wang,et al.  POWER_SAGE: comparing statistical tests for SAGE experiments , 2000, Bioinform..

[4]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[7]  C. Bult,et al.  Functional annotation of a full-length mouse cDNA collection , 2001, Nature.

[8]  Edward M. Marcotte,et al.  Exploiting Big Biology: Integrating Large-scale Biological Data for Function Inference , 2001, Briefings Bioinform..

[9]  G. Robinson,et al.  Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. , 2002, Genome research.

[10]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica) , 2002, Science.

[11]  Yiwei Li,et al.  Gene expression profiles of genistein-treated PC3 prostate cancer cells. , 2002, The Journal of nutrition.

[12]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[13]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[14]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database - An integrated resource of GO annotations to the UniProt Knowledgebase , 2003, Silico Biol..

[15]  Sorin Drăghici,et al.  Data Analysis Tools for DNA Microarrays , 2003 .

[16]  Wenbo Xu,et al.  Sister grouping of chimpanzees and humans as revealed by genome-wide phylogenetic analysis of brain gene expression profiles. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  J. Bard,et al.  Ontologies in biology: design, applications and future challenges , 2004, Nature Reviews Genetics.

[18]  P. Gaffney,et al.  Identification of a Gene Expression Signature Associated with Recurrent Disease in Squamous Cell Carcinoma of the Head and Neck , 2004, Cancer Research.

[19]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[20]  Frederick P Roth,et al.  Discovering functional relationships: biochemistry versus genetics. , 2005, Trends in genetics : TIG.

[21]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[22]  Steven Skiena,et al.  Lowest common ancestors in trees and directed acyclic graphs , 2005, J. Algorithms.

[23]  Emily Dimmer,et al.  An evaluation of GO annotation retrieval for BioCreAtIvE and GOA , 2005, BMC Bioinformatics.

[24]  Purvesh Khatri,et al.  A semantic analysis of the annotations of the human genome , 2005, Bioinform..

[25]  Matthew A. Hibbs,et al.  Finding function: evaluation methods for functional genomic data , 2006, BMC Genomics.

[26]  G. Rubin,et al.  Global analyses of mRNA translational control during early Drosophila embryogenesis , 2007, Genome Biology.

[27]  Thomas Lengauer,et al.  GOTax: investigating biological processes and biochemical activities along the taxonomic tree , 2007, Genome Biology.

[28]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[29]  Martin Vingron,et al.  Improved detection of overrepresentation of Gene-Ontology annotations with parent-child analysis , 2007, Bioinform..

[30]  H. Dressman,et al.  Gene Expression Signatures That Predict Radiation Exposure in Mice and Humans , 2007, PLoS medicine.

[31]  Jennifer R Wortman,et al.  Transcriptional Regulation of Chemical Diversity in Aspergillus fumigatus by LaeA , 2007, PLoS pathogens.

[32]  S. Kasif,et al.  Network-Based Analysis of Affected Biological Processes in Type 2 Diabetes Models , 2007, PLoS genetics.

[33]  F. McCarthy,et al.  GOing from functional genomics to biological significance , 2007, Cytogenetic and Genome Research.

[34]  Olga G Troyanskaya,et al.  Computational identification of cellular networks and pathways , 2007 .

[35]  Nigam H. Shah,et al.  Current progress in network research: toward reference networks for key model organisms , 2007, Briefings Bioinform..

[36]  A. Terzic,et al.  Genomic chart guiding embryonic stem cell cardiopoiesis , 2008, Genome Biology.

[37]  Matteo Pellegrini,et al.  Whole-Genome Analysis of Histone H3 Lysine 27 Trimethylation in Arabidopsis , 2007, PLoS biology.

[38]  Kenichi Tanaka,et al.  Genome-Wide Expression of Azoospermia Testes Demonstrates a Specific Profile and Implicates ART3 in Genetic Susceptibility , 2008, PLoS genetics.

[39]  Shivashankar H. Nagaraj,et al.  Transcriptional Changes in the Hookworm, Ancylostoma caninum, during the Transition from a Free-Living to a Parasitic Larva , 2008, PLoS neglected tropical diseases.

[40]  Alessio Farcomeni,et al.  A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion , 2008, Statistical methods in medical research.

[41]  Suqin Cai,et al.  Stamen Abscission Zone Transcriptome Profiling Reveals New Candidates for Abscission Control: Enhanced Retention of Floral Organs in Transgenic Plants Overexpressing Arabidopsis ZINC FINGER PROTEIN21[C][W][OA] , 2008, Plant Physiology.

[42]  B. Dijkmans,et al.  Expression of a pathogen-response program in peripheral blood cells defines a subgroup of Rheumatoid Arthritis patients , 2008, Genes and Immunity.