BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation

Abstract Metagenomics-based studies of mixed microbial communities are impacting biotechnology, life sciences and medicine. Computational binning of metagenomic data is a powerful approach for the culture-independent recovery of population-resolved genomic sequences, i.e. from individual or closely related, constituent microorganisms. Existing binning solutions often require a priori characterized reference genomes and/or dedicated compute resources. Extending currently available reference-independent binning tools, we developed the BusyBee Web server for the automated deconvolution of metagenomic data into population-level genomic bins using assembled contigs (Illumina) or long reads (Pacific Biosciences, Oxford Nanopore Technologies). A reversible compression step as well as bootstrapped supervised binning enable quick turnaround times. The binning results are represented in interactive 2D scatterplots. Moreover, bin quality estimates, taxonomic annotations and annotations of antibiotic resistance genes are computed and visualized. Ground truth-based benchmarks of BusyBee Web demonstrate comparably high performance to state-of-the-art binning solutions for assembled contigs and markedly improved performance for long reads (median F1 scores: 70.02–95.21%). Furthermore, the applicability to real-world metagenomic datasets is shown. In conclusion, our reference-independent approach automatically bins assembled contigs or long reads, exhibits high sensitivity and precision, enables intuitive inspection of the results, and only requires FASTA-formatted input. The web-based application is freely accessible at: https://ccb-microbe.cs.uni-saarland.de/busybee.

[1]  Phelim Bradley,et al.  Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis , 2015, Nature Communications.

[2]  Kelly M. McGarvey,et al.  Wide Variation in Antibiotic Resistance Proteins Identified by Functional Metagenomic Screening of a Soil DNA Library , 2012, Applied and Environmental Microbiology.

[3]  T. Vishnivetskaya,et al.  Draft Genome Sequence of Antarctic Methanogen Enriched from Dry Valley Permafrost , 2016, Genome Announcements.

[4]  Julian Parkhill,et al.  Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology , 2016, bioRxiv.

[5]  P. Keeling,et al.  Genome Evolution and Nitrogen Fixation in Bacterial Ectosymbionts of a Protist Inhabiting Wood-Feeding Cockroaches , 2016, Applied and Environmental Microbiology.

[6]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[7]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[8]  Molly K. Gibson,et al.  Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology , 2014, The ISME Journal.

[9]  C. Quince,et al.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. , 2013, Environmental microbiology.

[10]  Eric van der Helm,et al.  Rapid resistome mapping using nanopore sequencing , 2016, bioRxiv.

[11]  J. A. Russell,et al.  Deep Subsurface Life from North Pond: Enrichment, Isolation, Characterization and Genomes of Heterotrophic Bacteria , 2016, Front. Microbiol..

[12]  J. Korlach,et al.  Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing , 2016, mBio.

[13]  A. Heintz‐Buschart,et al.  IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses , 2016, Genome Biology.

[14]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[15]  Anders F. Andersson,et al.  Binning metagenomic contigs by coverage and composition , 2014, Nature Methods.

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[18]  R. Morris,et al.  Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota , 2012, Science.

[19]  W. Wade,et al.  Strategies for culture of 'unculturable' bacteria. , 2010, FEMS microbiology letters.

[20]  Frank Oliver Glöckner,et al.  TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences , 2004, BMC Bioinformatics.

[21]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[22]  Ruben E. Valas,et al.  Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage , 2011, The ISME Journal.

[23]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.

[24]  N. Loman,et al.  A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4. , 2013, JAMA.

[25]  S. Deschamps,et al.  Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from Agrobacterium tumefaciens , 2016, Scientific Reports.

[26]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[27]  Monzoorul Haque Mohammed,et al.  SPHINX - an algorithm for taxonomic binning of metagenomic sequences , 2011, Bioinform..

[28]  Brian C. Thomas,et al.  Community-wide analysis of microbial genome sequence signatures , 2009, Genome Biology.

[29]  Gail L. Rosen,et al.  NBC: the Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads , 2010, Bioinform..

[30]  T. Sicheritz-Pontén,et al.  Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing , 2017, GigaScience.

[31]  S. Salzberg,et al.  Centrifuge: rapid and sensitive classification of metagenomic sequences , 2016, bioRxiv.

[32]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[33]  Nitin S. Baliga,et al.  Community-integrated omics links dominance of a microbial generalist to fine-tuned resource usage , 2014, Nature Communications.

[34]  Vineet K. Sharma,et al.  Fast and Accurate Taxonomic Assignments of Metagenomic Sequences Using MetaBin , 2012, PloS one.

[35]  Alexander Sczyrba,et al.  Nonlinear Dimensionality Reduction for Cluster Identification in Metagenomic Samples , 2013, 2013 17th International Conference on Information Visualisation.

[36]  A. Heintz‐Buschart,et al.  Identification, Recovery, and Refinement of Hitherto Undescribed Population-Level Genomes from the Human Gastrointestinal Tract , 2016, Front. Microbiol..

[37]  Shilin Chen,et al.  FastUniq: A Fast De Novo Duplicates Removal Tool for Paired Short Reads , 2012, PloS one.

[38]  S. Turner,et al.  A flexible and efficient template format for circular consensus sequencing and SNP detection , 2010, Nucleic acids research.

[39]  Richard J. Hall,et al.  MinION nanopore sequencing of an influenza genome , 2015, Front. Microbiol..

[40]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[41]  P. B. Pope,et al.  Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data , 2015, Scientific Reports.

[42]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[43]  V. M. D. Martins dos Santos,et al.  Functional consequences of microbial shifts in the human gastrointestinal tract linked to antibiotic treatment and obesity , 2013, Gut microbes.

[44]  M. Strous,et al.  The Binning of Metagenomic Contigs for Microbial Physiology of Mixed Cultures , 2012, Front. Microbio..

[45]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[46]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[47]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[48]  R. Franklin,et al.  MinION TM nanopore sequencing of environmental metagenomes: a synthetic approach , 2017 .

[49]  Jens Roat Kultima,et al.  Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes , 2014, Nature Biotechnology.

[50]  Alvin T. Liem,et al.  Bacterial and viral identification and differentiation by amplicon sequencing on the MinION nanopore sequencer , 2015, GigaScience.

[51]  Idris A. Eckley,et al.  changepoint: An R Package for Changepoint Analysis , 2014 .

[52]  Paul Wilmes,et al.  Alignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction , 2014, Scientific Reports.

[53]  Blake A. Simmons,et al.  MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets , 2016, Bioinform..

[54]  Tom O. Delmont,et al.  Anvi’o: an advanced analysis and visualization platform for ‘omics data , 2015, PeerJ.

[55]  Alice C McHardy,et al.  PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes , 2014, PeerJ.

[56]  Inna Dubchak,et al.  Elviz – exploration of metagenome assemblies with an interactive visualization tool , 2015, BMC Bioinformatics.

[57]  Piotr Gawron,et al.  VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data , 2015, Microbiome.

[58]  M. Forsman,et al.  Scaffolding of a bacterial genome using MinION nanopore sequencing , 2015, Scientific Reports.

[59]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[60]  Anders Krogh,et al.  Fast and sensitive taxonomic classification for metagenomics with Kaiju , 2016, Nature Communications.

[61]  Joshua Quick,et al.  Rapid draft sequencing and real-time nanopore sequencing in a hospital outbreak of Salmonella , 2015, Genome Biology.

[62]  G. Braus,et al.  One Juliet and four Romeos: VeA and its methyltransferases , 2015, Front. Microbiol..

[63]  P. Hugenholtz,et al.  Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes , 2013, Nature Biotechnology.

[64]  S. Tringe,et al.  Tackling soil diversity with the assembly of large, complex metagenomes , 2014, Proceedings of the National Academy of Sciences.

[65]  A. Heintz‐Buschart,et al.  Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes , 2016, Nature Microbiology.

[66]  Brian C. Thomas,et al.  Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization , 2013, Genome research.