AIGO: Towards a unified framework for the Analysis and the Inter-comparison of GO functional annotations

BackgroundIn response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall.ResultsIn this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo) for the Analysis and the Inter-comparison of the products of Gene Ontology (GO) annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats.ConclusionsThis work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way.

[1]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[2]  T. Buza,et al.  Gene Ontology annotation quality analysis in model eukaryotes , 2008, Nucleic acids research.

[3]  Tanya Z Berardini,et al.  The representation of heart development in the gene ontology. , 2011, Developmental biology.

[4]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[5]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[6]  José María Carazo,et al.  Assessment of protein set coherence using functional annotations , 2008, BMC Bioinformatics.

[7]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[8]  Karin M. Verspoor,et al.  A categorization approach to automated ontological function annotation , 2006, Protein science : a publication of the Protein Society.

[9]  Guoying Liu,et al.  NetAffx: Affymetrix probesets and annotations , 2003, Nucleic Acids Res..

[10]  Luay Nakhleh,et al.  GS2: an efficiently computable measure of GO-based similarity of gene sets , 2009, Bioinform..

[11]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[12]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[13]  Stefan Götz,et al.  Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics , 2007, International journal of plant genomics.

[14]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[15]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.

[16]  Yasmin Alam-Faruque,et al.  The Renal Gene Ontology Annotation Initiative , 2010, Organogenesis.

[17]  Mark L. Blaxter,et al.  annot8r: GO, EC and KEGG annotation of EST datasets , 2008, BMC Bioinformatics.

[18]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[19]  Joaquín Dopazo,et al.  Genome analysis Advance Access publication February 18, 2011 B2G-FAR, a species-centered GO annotation repository , 2022 .

[20]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[21]  Gertraud Burger,et al.  AutoFACT: An Automatic Functional Annotation and Classification Tool , 2005, BMC Bioinformatics.

[22]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[23]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[24]  M. Gerstein,et al.  Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.

[25]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[26]  Shane C. Burgess,et al.  ArrayIDer: automated structural re-annotation pipeline for DNA microarrays , 2009, BMC Bioinformatics.

[27]  Li Ni,et al.  The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species , 2009, PLoS Comput. Biol..