BCdatabaser: on-the-fly reference database creation for (meta-)barcoding

SUMMARY DNA barcoding and meta-barcoding have become irreplaceable in research and applications, where identification of taxa alone or within a mixture, respectively, becomes relevant. Pioneering studies were in the microbiological context, yet nowadays also plants and animals become targeted. Given the variety of markers used, formatting requirements for classifiers and constant growth of primary databases, there is need for dedicated reference database creation. We developed a web and command line interface to generate such on-the-fly for any applicable marker and taxonomic group with optional filtering, formatting and restriction specific for (meta-)barcoding purposes. Also, databases optionally receive a DOI, making them well documented with meta-data, publicly sharable and citable. AVAILABILITY source code: https://www.github.com/molbiodiv/bcdatabaser, webservice: https://bcdatabaser.molecular.eco, documentation: https://molbiodiv.github.io/bcdatabaser.

[1]  Md Saydur Rahman,et al.  Past, present, and future perspectives of environmental DNA (eDNA) metabarcoding: A systematic review in methods, monitoring, and applications of global eDNA , 2019, Global Ecology and Conservation.

[2]  A. Keller,et al.  FENNEC - Functional Exploration of Natural Networks and Ecological Communities , 2017, bioRxiv.

[3]  Ben Nichols,et al.  Distributed under Creative Commons Cc-by 4.0 Vsearch: a Versatile Open Source Tool for Metagenomics , 2022 .

[4]  Robert C. Edgar,et al.  SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences , 2016, bioRxiv.

[5]  B. Brosi,et al.  Pollen DNA barcoding: current applications and future prospects. , 2016, Genome.

[6]  Johan Bengtsson-Palme,et al.  metaxa2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data , 2015, Molecular ecology resources.

[7]  Ingolf Steffan-Dewenter,et al.  Increased efficiency in identifying mixed pollen samples by meta-barcoding with a dual-indexing approach , 2015, BMC Ecology.

[8]  Gernot Glöckner,et al.  Metabarcoding vs. morphological identification to assess diatom diversity in environmental studies , 2015, Molecular ecology resources.

[9]  Thomas Hackl,et al.  proovread: large-scale high-accuracy PacBio correction through iterative short read consensus , 2014, Bioinform..

[10]  Regine Jahn,et al.  Taxonomic Reference Libraries for Environmental Barcoding: A Best Practice Example from Diatom Research , 2014, PloS one.

[11]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[12]  Dennis A. Benson,et al.  GenBank , 2012, Nucleic acids research.

[13]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[14]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[15]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[16]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[17]  Thomas Dandekar,et al.  5.8S-28S rRNA interaction and HMM-based ITS2 annotation. , 2009, Gene.

[18]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[19]  G. Casella,et al.  Pyrosequencing enumerates and contrasts soil microbial diversity , 2007, The ISME Journal.

[20]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[21]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[22]  Jeremy R. deWaard,et al.  Biological identifications through DNA barcodes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[23]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.