*-DCC: A platform to collect, annotate, and explore a large variety of sequencing experiments

Abstract Background Over the past few years the variety of experimental designs and protocols for sequencing experiments increased greatly. To ensure the wide usability of the produced data beyond an individual project, rich and systematic annotation of the underlying experiments is crucial. Findings We first developed an annotation structure that captures the overall experimental design as well as the relevant details of the steps from the biological sample to the library preparation, the sequencing procedure, and the sequencing and processed files. Through various design features, such as controlled vocabularies and different field requirements, we ensured a high annotation quality, comparability, and ease of annotation. The structure can be easily adapted to a large variety of species. We then implemented the annotation strategy in a user-hosted web platform with data import, query, and export functionality. Conclusions We present here an annotation structure and user-hosted platform for sequencing experiment data, suitable for lab-internal documentation, collaborations, and large-scale annotation efforts.

[1]  J. Michael Cherry,et al.  ENCODE data at the ENCODE portal , 2015, Nucleic Acids Res..

[2]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[3]  Nikos Kyrpides,et al.  The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification , 2014, Nucleic Acids Res..

[4]  Piero Carninci,et al.  FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki , 2016, Database J. Biol. Databases Curation.

[5]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[6]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[7]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[8]  Sergio Contrino,et al.  The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details , 2011, Database J. Biol. Databases Curation.

[9]  Ruben Vicente-Saez,et al.  Open Science now: A systematic literature review for an integrated definition , 2018, Journal of Business Research.

[10]  Jing Zhang,et al.  Erratum to: The real cost of sequencing: scaling computation to keep pace with data generation , 2016, Genome Biology.

[11]  D. Onichtchouk,et al.  DANIO-CODE: Toward an Encyclopedia of DNA Elements in Zebrafish , 2016, Zebrafish.

[12]  Tatiana A. Tatusova,et al.  BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..

[13]  Kathleen M Jagodnik,et al.  Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd , 2016, Nature Communications.