The Global Genome Biodiversity Network (GGBN) Data Standard specification

Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today’s ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard. Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard

[1]  Gabriele Droege,et al.  “Life in Data”—Outcome of a Multi-Disciplinary, Interactive Biobanking Conference Session on Sample Data , 2016, Biopreservation and biobanking.

[2]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[3]  Erica E Benson,et al.  Standard PREanalytical Codes: A New Paradigm for Environmental Biobanking Sectors Explored in Algal Culture Collections. , 2011, Biopreservation and biobanking.

[4]  Renzo Kottmann,et al.  A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML). , 2008, Omics : a journal of integrative biology.

[5]  O. Haddrath,et al.  Sampling Vertebrate Collections for Molecular Research : Practice and Policies , 2008 .

[6]  J. Coddington,et al.  Greater than X kb: a quantitative assessment of preservation conditions on genomic DNA quality, and a proposed standard for genome-quality DNA , 2016, PeerJ.

[7]  Natalia N. Ivanova,et al.  A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea , 2009, Nature.

[8]  E. Benson,et al.  Can Biospecimen Science Expedite the Ex Situ Conservation of Plants in Megadiverse Countries? A Focus on the Flora of Brazil , 2013 .

[9]  R DeSalle,et al.  Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata , 2007, Proceedings of the Royal Society B: Biological Sciences.

[10]  Matthias Buck,et al.  The Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity , 2011 .

[11]  Umberto Nanni,et al.  Standard preanalytical coding for biospecimens: review and implementation of the Sample PREanalytical Code (SPREC). , 2012, Biopreservation and biobanking.

[12]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[13]  B. Misof,et al.  How to tackle the molecular species inventory for an industrialized nation-lessons from the first phase of the German Barcode of Life initiative GBOL (2012-2015). , 2016, Genome.

[14]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[15]  H. Chandler Database , 1985 .

[16]  C. Marshall,et al.  Has the Earth’s sixth mass extinction already arrived? , 2011, Nature.

[17]  Chris F. Taylor,et al.  The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[18]  Ana M. Aransay,et al.  Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing , 2016, Springer International Publishing.

[19]  Peter H. Watson,et al.  An online tool for improving biospecimen data element reporting. , 2012, Biopreservation and biobanking.

[20]  Gabriele Dröge,et al.  The Global Genome Biodiversity Network (GGBN) Data Portal , 2013, Nucleic Acids Res..

[21]  Jamie Whitacre,et al.  B-HIT - A Tool for Harvesting and Indexing Biodiversity Data , 2015, PloS one.

[22]  M. Wilkinson,et al.  Quantitative evaluation of bias in PCR amplification and next-generation sequencing derived from metabarcoding samples , 2015, Analytical and Bioanalytical Chemistry.

[23]  Anton Güntsch,et al.  The DNA bank network: the start from a german initiative. , 2011, Biopreservation and biobanking.

[24]  Renzo Kottmann,et al.  Microbiological Common Language (MCL): a standard for electronic information exchange in the Microbial Commons. , 2010, Research in microbiology.

[25]  Umberto Nanni,et al.  SPRECware: Software Tools for Standard PREanalytical Code (SPREC) Labeling – Effective Exchange and Search of Stored Biospecimens , 2012, The International journal of biological markers.

[26]  F. Betsou,et al.  Standard Preanalytical Coding for Biospecimens: Defining the Sample PREanalytical Code , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[27]  ISBER NEWS,et al.  International society for biological and environmental repositories. , 2009, Biopreservation and biobanking.

[28]  K. Mullis,et al.  Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. , 1986, Cold Spring Harbor symposia on quantitative biology.

[29]  Ana M. Aransay,et al.  The High-Throughput Sequencing Technologies Triple-W Discussion: Why Use HTS, What Is the Optimal HTS Method to Use, and Which Data Analysis Workflow to Follow , 2016 .

[30]  M. Vicente DNA banks: providing novel options for genebanks? , 2006 .

[31]  Walter G. Berendsohn,et al.  International Networking of Large Amounts of Primary Biodiversity Data , 2009, GI Jahrestagung.

[32]  Brian K. Schmidt,et al.  Project Description: DNA Barcodes of Bird Species in the National Museum of Natural History, Smithsonian Institution, USA , 2011, ZooKeys.

[33]  M. Snyder,et al.  High-throughput sequencing technologies. , 2015, Molecular cell.

[34]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[35]  Walter G. Berendsohn,et al.  The ABCD of primary biodiversity data access , 2012 .

[36]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[37]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[38]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[39]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[40]  R. DeSalle,et al.  Mitochondrial DNA extraction and sequencing of formalin-fixed archival snake tissue , 2008, Mitochondrial DNA.

[41]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[42]  Javier Otegui,et al.  The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet , 2014, PloS one.

[43]  Andreas R. Pfenning,et al.  Comparative genomics reveals insights into avian genome evolution and adaptation , 2014, Science.

[44]  Gontran Sonet,et al.  Utility of GenBank and the Barcode of Life Data Systems (BOLD) for the identification of forensically important Diptera from Belgium and France , 2013, ZooKeys.