Gene Ontology annotation quality analysis in model eukaryotes

Functional analysis using the Gene Ontology (GO) is crucial for array analysis, but it is often difficult for researchers to assess the amount and quality of GO annotations associated with different sets of gene products. In many cases the source of the GO annotations and the date the GO annotations were last updated is not apparent, further complicating a researchers’ ability to assess the quality of the GO data provided. Moreover, GO biocurators need to ensure that the GO quality is maintained and optimal for the functional processes that are most relevant for their research community. We report the GO Annotation Quality (GAQ) score, a quantitative measure of GO quality that includes breadth of GO annotation, the level of detail of annotation and the type of evidence used to make the annotation. As a case study, we apply the GAQ scoring method to a set of diverse eukaryotes and demonstrate how the GAQ score can be used to track changes in GO annotations over time and to assess the quality of GO annotations available for specific biological processes. The GAQ score also allows researchers to quantitatively assess the functional data available for their experimental systems (arrays or databases).

[1]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[2]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[3]  Rolf Apweiler,et al.  Applications of InterPro in Protein Annotation and Genome Analysis , 2002, Briefings Bioinform..

[4]  Randall A. Bolanos,et al.  Whole-genome shotgun assembly and comparison of human genome assemblies , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Suzanna E Lewis,et al.  Gene Ontology: looking backwards and forwards , 2004, Genome Biology.

[6]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[7]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[8]  Zhiyong Lu,et al.  Finding GeneRIFs via Gene Ontology Annotations , 2005, Pacific Symposium on Biocomputing.

[9]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide , 2005, Nucleic Acids Res..

[10]  Monte Westerfield,et al.  The Zebrafish Information Network: the zebrafish model organism database , 2005, Nucleic Acids Res..

[11]  Nan Wang,et al.  AgBase: a unified resource for functional analysis in agriculture , 2006, Nucleic Acids Res..

[12]  Gil Alterovitz,et al.  GO PaD: the Gene Ontology Partition Database , 2006, Nucleic Acids Res..

[13]  T. Ryan Gregory,et al.  Eukaryotic genome size databases , 2006, Nucleic Acids Res..