Interpretation of biological experiments changes with evolution of the Gene Ontology and its annotations

Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis — the ontology and the annotations — evolve rapidly. We used gene signatures derived from 104 disease analyses to systematically evaluate how enrichment analysis results were affected by evolution of the GO over a decade. We found low consistency between enrichment analyses results obtained with early and more recent GO versions. Furthermore, there continues to be strong annotation bias in the GO annotations where 58% of the annotations are for 16% of the human genes. Our analysis suggests that GO evolution may have affected the interpretation and possibly reproducibility of experiments over time. Hence, researchers must exercise caution when interpreting GO enrichment analyses and should reexamine previous analyses with the most recent GO version.

[1]  Purvesh Khatri,et al.  Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. , 2016, The Lancet. Respiratory medicine.

[2]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[3]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[4]  Francesco Vallania,et al.  Methods to increase reproducibility in differential gene expression via meta-analysis , 2016, Nucleic acids research.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[7]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[8]  Winston Haynes,et al.  Empowering Multi-Cohort Gene Expression Analysis to Increase Reproducibility , 2016, bioRxiv.

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  Purvesh Khatri,et al.  Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate , 2003, Nucleic Acids Res..

[11]  Erhard Rahm,et al.  Rule-based Generation of Diff Evolution Mappings between Ontology Versions , 2010, ArXiv.

[12]  Judith A. Blake,et al.  Ten Quick Tips for Using the Gene Ontology , 2013, PLoS Comput. Biol..

[13]  Purvesh Khatri,et al.  A meta-analysis of lung cancer gene expression identifies PTK7 as a survival gene in lung adenocarcinoma. , 2014, Cancer research.

[14]  Virginia Pascual,et al.  An Interferon-Inducible Neutrophil-Driven Blood Transcriptional Signature in Human Tuberculosis , 2010, Nature.

[15]  Erhard Rahm,et al.  Impact of ontology evolution on functional analyses , 2012, Bioinform..

[16]  Benjamin M. Good,et al.  A task-based approach for Gene Ontology evaluation , 2013, J. Biomed. Semant..

[17]  Tony Sawford,et al.  Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt , 2014, GigaScience.

[18]  Anushya Muruganujan,et al.  PANTHER version 10: expanded protein families and functions, and analysis tools , 2015, Nucleic Acids Res..

[19]  Olivier Bodenreider,et al.  Bio-ontologies: current trends and future directions , 2006, Briefings Bioinform..

[20]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[21]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[22]  Sara Ballouz,et al.  Using predictive specificity to determine when gene set analysis is biologically meaningful , 2016, Nucleic acids research.

[23]  P. Khatri,et al.  Robust classification of bacterial and viral infections via integrated host gene expression diagnostics , 2016, Science Translational Medicine.

[24]  R. Baserga,et al.  THE RELATIONSHIP OF THE CELL CYCLE TO TUMOR GROWTH AND CONTROL OF CELL DIVISION: A REVIEW. , 1965, Cancer research.

[25]  Alexander D. Diehl,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm029 Databases and ontologies Ontology development for biological systems: immunology , 2006 .

[26]  Sylvie Ranwez,et al.  The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies , 2014, Bioinform..

[27]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Paul Pavlidis,et al.  Assessing identity, redundancy and confounds in Gene Ontology annotations over time , 2013, Bioinform..

[29]  Purvesh Khatri,et al.  A comprehensive time-course–based multicohort analysis of sepsis and sterile inflammation reveals a robust diagnostic gene set , 2015, Science Translational Medicine.

[30]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[31]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[32]  A. Butte,et al.  SMYD3 links lysine methylation of MAP3K2 to Ras-driven cancer , 2014, Nature.

[33]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[34]  F. Ennis,et al.  Immune interferon produced to high levels by antigenic stimulation of human lymphocytes with influenza virus , 1981, The Journal of experimental medicine.

[35]  Purvesh Khatri,et al.  Integrated, Multi-cohort Analysis Identifies Conserved Transcriptional Signatures across Multiple Respiratory Viruses , 2015, Immunity.

[36]  Paul Pavlidis,et al.  “Guilt by Association” Is the Exception Rather Than the Rule in Gene Networks , 2012, PLoS Comput. Biol..

[37]  Predrag Radivojac,et al.  The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective , 2014, Bioinform..

[38]  Jesse Gillis,et al.  The Impact of Multifunctional Genes on "Guilt by Association" Analysis , 2011, PloS one.

[39]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[40]  Sara Ballouz,et al.  Bias tradeoffs in the creation and analysis of protein-protein interaction networks. , 2014, Journal of proteomics.

[41]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[42]  Alexander A. Morgan,et al.  A common rejection module (CRM) for acute rejection across multiple organs identifies novel therapeutics for organ transplantation , 2013, The Journal of experimental medicine.