GOATOOLS: A Python library for Gene Ontology analyses

The biological interpretation of gene lists with interesting shared properties, such as up- or down-regulation in a particular experiment, is typically accomplished using gene ontology enrichment analysis tools. Given a list of genes, a gene ontology (GO) enrichment analysis may return hundreds of statistically significant GO results in a “flat” list, which can be challenging to summarize. It can also be difficult to keep pace with rapidly expanding biological knowledge, which often results in daily changes to any of the over 47,000 gene ontologies that describe biological knowledge. GOATOOLS, a Python-based library, makes it more efficient to stay current with the latest ontologies and annotations, perform gene ontology enrichment analyses to determine over- and under-represented terms, and organize results for greater clarity and easier interpretation using a novel GOATOOLS GO grouping method. We performed functional analyses on both stochastic simulation data and real data from a published RNA-seq study to compare the enrichment results from GOATOOLS to two other popular tools: DAVID and GOstats. GOATOOLS is freely available through GitHub: https://github.com/tanghaibao/goatools.

[1]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[2]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[3]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[4]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[5]  Mark R Schultz,et al.  False discovery rate control is a recommended alternative to Bonferroni-type adjustments in health studies. , 2014, Journal of clinical epidemiology.

[6]  Christophe Dessimoz,et al.  Gene Ontology: Pitfalls, Biases, and Remedies. , 2016, Methods in molecular biology.

[7]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[8]  Rachael P. Huntley,et al.  Standardized description of scientific evidence using the Evidence Ontology (ECO) , 2014, Database J. Biol. Databases Curation.

[9]  Feng-bin Yan,et al.  De novo assembly and characterization of the spleen transcriptome of common carp (Cyprinus carpio) using Illumina paired-end sequencing. , 2015, Fish & shellfish immunology.

[10]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[11]  Manolis Kellis,et al.  Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease , 2015, Nature.

[12]  Judith A. Blake,et al.  Mouse Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse , 2016, Nucleic Acids Res..

[13]  Gil Alterovitz,et al.  GO PaD: the Gene Ontology Partition Database , 2006, Nucleic Acids Res..

[14]  Lincoln D. Stein,et al.  Impact of outdated gene annotations on pathway enrichment analysis , 2016, Nature Methods.

[15]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[16]  D. Calado,et al.  Germinal Centers , 2017, Methods in Molecular Biology.

[17]  Antonio J Giraldez,et al.  Codon identity regulates mRNA stability and translation efficiency during the maternal‐to‐zygotic transition , 2016, The EMBO journal.

[18]  Jun Cheng,et al.  Transcriptome and Gene Expression Analysis of an Oleaginous Diatom Under Different Salinity Conditions , 2013, BioEnergy Research.

[19]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[22]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[23]  U. Klein,et al.  Dynamics of B cells in germinal centres , 2015, Nature Reviews Immunology.

[24]  William Stafford Noble,et al.  How does multiple testing correction work? , 2009, Nature Biotechnology.

[25]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[26]  Jelle J. Goeman,et al.  Multiple hypothesis testing in genomics , 2014, Statistics in medicine.

[27]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[28]  N. Skunca,et al.  Visualizing GO Annotations. , 2016, Methods in molecular biology.

[29]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.