Principles of metadata organization at the ENCODE data coordination center

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org

[1]  Ting Wang,et al.  ENCODE whole-genome data in the UCSC Genome Browser , 2009, Nucleic Acids Res..

[2]  J. Michael Cherry,et al.  Ontology application and use at the ENCODE DCC , 2015, Database J. Biol. Databases Curation.

[3]  Shane J. Neph,et al.  A comparative encyclopedia of DNA elements in the mouse genome , 2014, Nature.

[4]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[5]  Raymond K. Auerbach,et al.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project , 2010, Science.

[6]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[7]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[8]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[9]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[10]  Paul T. Spellman,et al.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB , 2006, BMC Bioinformatics.

[11]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[12]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[13]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[14]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[15]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[16]  Sergio Contrino,et al.  The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details , 2011, Database J. Biol. Databases Curation.

[17]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[18]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[19]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[20]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..

[21]  S. Lewis,et al.  Uberon, an integrative multi-species anatomy ontology , 2012, Genome Biology.

[22]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[23]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[24]  J. Michael Cherry,et al.  ENCODE data at the ENCODE portal , 2015, Nucleic Acids Res..

[25]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[26]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[27]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..