Ontology application and use at the ENCODE DCC

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC’s use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects. Database URL: https://www.encodeproject.org/

[1]  Peter J. Bickel,et al.  Comparative analysis of regulatory information and circuits across distant species , 2014, Nature.

[2]  Bettina Fazzinga,et al.  FOX: Inference of Approximate Functional Dependencies from XML Data , 2007 .

[3]  Tatiana A. Tatusova,et al.  BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..

[4]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[5]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[6]  Sergio Contrino,et al.  The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details , 2011, Database J. Biol. Databases Curation.

[7]  Lennart Martens,et al.  The Ontology Lookup Service: bigger and better , 2010, Nucleic Acids Res..

[8]  Giovanni Maria Sacco Research Results in Dynamic Taxonomy and Faceted Search Systems , 2007, 18th International Workshop on Database and Expert Systems Applications (DEXA 2007).

[9]  Philip Cayting,et al.  An encyclopedia of mouse DNA elements (Mouse ENCODE) , 2012, Genome Biology.

[10]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[11]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[12]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..

[13]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[14]  Moritz Herrmann,et al.  Comparative analysis of metazoan chromatin organization , 2014, Nature.

[15]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[16]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[17]  H. Chandler Database , 1985 .

[18]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[19]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[20]  Marco Brandizi,et al.  The BioSample Database (BioSD) at the European Bioinformatics Institute , 2011, Nucleic Acids Res..

[21]  David Haussler,et al.  ENCODE Data in the UCSC Genome Browser: year 5 update , 2012, Nucleic Acids Res..

[22]  Barry Smith,et al.  Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies , 2014, PloS one.

[23]  William Stafford Noble,et al.  Comparative analysis of metazoan chromatin , 2014 .

[24]  Jian Zhang,et al.  The Protein Ontology: a structured representation of protein forms and complexes , 2010, Nucleic Acids Res..

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[27]  María Martín,et al.  The Gene Ontology: enhancements for 2011 , 2011, Nucleic Acids Res..

[28]  Peter J. Bickel,et al.  Comparative Analysis of the Transcriptome across Distant Species , 2014, Nature.

[29]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[30]  Raymond Y. N. Lee,et al.  Building a Cell and Anatomy Ontology of Caenorhabditis Elegans , 2003, Comparative and functional genomics.

[31]  Victor B. Strelets,et al.  FlyBase: anatomical data, images and queries , 2005, Nucleic Acids Res..

[32]  Ting Wang,et al.  ENCODE whole-genome data in the UCSC Genome Browser , 2009, Nucleic Acids Res..

[33]  Martin Kuiper,et al.  OLSVis: an animated, interactive visual browser for bio-ontologies , 2011, BMC Bioinformatics.