Gene Ontology: Pitfalls, Biases, and Remedies.

The Gene Ontology (GO) is a formidable resource, but there are several considerations about it that are essential to understand the data and interpret it correctly. The GO is sufficiently simple that it can be used without deep understanding of its structure or how it is developed, which is both a strength and a weakness. In this chapter, we discuss some common misinterpretations of the ontology and the annotations. A better understanding of the pitfalls and the biases in the GO should help users make the most of this very rich resource. We also review some of the misconceptions and misleading assumptions commonly made about GO, including the effect of data incompleteness, the importance of annotation qualifiers, and the transitivity or lack thereof associated with different ontology relations. We also discuss several biases that can confound aggregate analyses such as gene enrichment analyses. For each of these pitfalls and biases, we suggest remedies and best practices.

[1]  C. Orengo,et al.  Protein function prediction--the power of multiplicity. , 2009, Trends in biotechnology.

[2]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[3]  Rachael P. Huntley,et al.  QuickGO: a web-based tool for Gene Ontology searching , 2009, Bioinform..

[4]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[5]  Prudence Mutowo-Meullenet,et al.  The GOA database: Gene Ontology annotation updates for 2015 , 2014, Nucleic Acids Res..

[6]  Kimberly Van Auken,et al.  A method for increasing expressivity of Gene Ontology annotations using a compositional approach , 2014, BMC Bioinformatics.

[7]  Benjamin M. Good,et al.  A task-based approach for Gene Ontology evaluation , 2013, J. Biomed. Semant..

[8]  Tony Sawford,et al.  Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt , 2014, GigaScience.

[9]  Anushya Muruganujan,et al.  Large-scale gene function analysis with the PANTHER classification system , 2013, Nature Protocols.

[10]  Christophe Dessimoz,et al.  CAFA and the open world of protein function predictions. , 2013, Trends in genetics : TIG.

[11]  J. Granada,et al.  Single Perivascular Delivery of Mitomycin C Stimulates p21 Expression and Inhibits Neointima Formation in Rat Arteries , 2005, Arteriosclerosis, thrombosis, and vascular biology.

[12]  Nitesh V. Chawla,et al.  Classifier Evaluation with Missing Negative Class Labels , 2013, IDA.

[13]  Christophe Dessimoz,et al.  Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs , 2012, PLoS Comput. Biol..

[14]  Marcus C. Chibucos,et al.  The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations , 2015, Database J. Biol. Databases Curation.

[15]  Gaston H. Gonnet,et al.  The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements , 2014, Nucleic Acids Res..

[16]  P. Bickel,et al.  Sex Bias in Graduate Admissions: Data from Berkeley , 1975, Science.

[17]  Christophe Dessimoz,et al.  Phylogenetic Profiling: How Much Input Data Is Enough? , 2015, PloS one.

[18]  Judith A. Blake,et al.  On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report , 2012, PLoS Comput. Biol..

[19]  Suzanna Lewis,et al.  Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium , 2011, Briefings Bioinform..

[20]  Marcus C Chibucos,et al.  The Evidence and Conclusion Ontology (ECO): Supporting GO Annotations. , 2017, Methods in molecular biology.

[21]  Seth Carbon,et al.  Get GO! Retrieving GO Data Using AmiGO, QuickGO, API, Files, and Tools. , 2017, Methods in molecular biology.

[22]  Prudence Mutowo-Meullenet,et al.  Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation , 2012, Database J. Biol. Databases Curation.

[23]  Angel Rubio,et al.  Correlation between Gene Expression and GO Semantic Similarity , 2005, TCBB.

[24]  James C. Hu,et al.  Primer on the Gene Ontology. , 2016, Methods in molecular biology.

[25]  Predrag Radivojac,et al.  The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective , 2014, Bioinform..

[26]  Predrag Radivojac,et al.  Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals , 2011, PLoS Comput. Biol..

[27]  J. Ecker,et al.  Multiple Type-B Response Regulators Mediate Cytokinin Signal Transduction in Arabidopsisw⃞ , 2005, The Plant Cell Online.

[28]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[29]  K. Harter,et al.  The response regulator 2 mediates ethylene signalling and hormone signal integration in Arabidopsis , 2004, The EMBO journal.

[30]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.

[31]  H C Clevers,et al.  Activation of the tumour suppressor kinase LKB1 by the STE20‐like pseudokinase STRAD , 2003, The EMBO journal.

[32]  Christophe Dessimoz,et al.  Quality of Computationally Inferred Gene Ontology Annotations , 2012, PLoS Comput. Biol..

[33]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[34]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[35]  Paul Pavlidis,et al.  Gene Ontology term overlap as a measure of gene functional similarity , 2008, BMC Bioinformatics.