Cell, Chemical and Anatomical Views of the Gene Ontology: Mapping to a Roche Controlled Vocabulary

The Gene Ontology (GO) consists of around 40,000 terms refering to classes of biological process, cell component and gene product activity. It has been used to annotate the functions and locations of several million gene products. Much pharmacological research focuses on understanding how disease conditions differ from physiological conditions in molecular terms with the aim of finding new drug targets for therapy. Gene set enrichment analysis using the GO and its annotations provides a powerful way to assess those differences. Roche has developed a bespoke controlled vocabulary (RCV) to support enrichment analysis. Each term is manually mapped to a list of Gene Ontology (GO) terms. The groupings are tailored to the research aims of Roche and as a result, many groupings are out-of-scope for GO classes. For example, many RCV terms group process and cell parts according to the cell type they occur in. The manual mapping strategy is labour intensive and hard to sustain as the GO evolves. We have automated mappings between RCV and the GO via OWL-EL queries. This is made possible by extensive axiomatisation linking the GO to ontologies of cells, anatomical entites and chemicals. We can fully automate mapping for approximately one third of the terms in the RCV, with another 40% having 10 or fewer GO terms requiring manual mapping. Automated mapping uncovers many missing mappings. GSEA using the resulting, semi-automated mapping of RCV to GO detects enrichment to gene sets missed with the manual-only mapping. The OWL query approach we describe can be used as the basis of new ways to query the GO, group annotations and carry out GSEA. Importantly, it allows the classifications used in enrichment analysis to be much more closely tailored to the needs of researchers and industry than was previously possible.

[1]  Tanya Z. Berardini,et al.  TermGenie – a web-application for pattern-based ontology class generation , 2014, J. Biomed. Semant..

[2]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[3]  V. Rakyan,et al.  Sexually dimorphic gene expression emerges with embryonic genome activation and is dynamic throughout development , 2015, BMC Genomics.

[4]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[5]  Christoph Steinbeck,et al.  Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology , 2013, BMC Genomics.

[6]  Judith A. Blake,et al.  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon , 2014, Journal of Biomedical Semantics.

[7]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Gary D Bader,et al.  Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation , 2010, PloS one.

[9]  Rachael P. Huntley,et al.  QuickGO: a web-based tool for Gene Ontology searching , 2009, Bioinform..

[10]  Chris Mungall,et al.  Use of OWL within the Gene Ontology , 2014, bioRxiv.

[11]  Alexander D. Diehl,et al.  Logical Development of the Cell Ontology , 2011, BMC Bioinformatics.

[12]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language Primer (Second Edition) , 2012 .

[13]  S. Amrouch,et al.  Survey on the literature of ontology mapping, alignment and merging , 2012, 2012 International Conference on Information Technology and e-Services.

[14]  Sean Bechhofer,et al.  The OWL API: A Java API for OWL ontologies , 2011, Semantic Web.

[15]  Markus Krötzsch,et al.  ELK Reasoner: Architecture and Evaluation , 2012, ORE.