Binning unassembled short reads based on k-mer covariance in a sparse coding framework

Sequence binning techniques enable the recovery of a growing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g. biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets. We present here a pre-assembly binning scheme (i.e. operating on unassembled short reads) enabling latent genomes recovery by leveraging sparse dictionary learning and elastic-net regularization, and demonstrate its efficiency and scalability by recovering hundreds of metagenome-assembled genomes, including very low-abundance genomes, from a joint analysis of microbiomes from the LifeLines-Deep population cohort (n=1135, > 1010 reads, 10 terabytes of sequence data).

[1]  Katherine H. Huang,et al.  Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning , 2015, Nature Biotechnology.

[2]  Siu-Ming Yiu,et al.  Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers , 2009, BMC Bioinformatics.

[3]  Edoardo Pasolli,et al.  Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle , 2019, Cell.

[4]  Jie Ren,et al.  Reads Binning Improves Alignment-Free Metagenome Comparison , 2019, Front. Genet..

[5]  Anders F. Andersson,et al.  Binning metagenomic contigs by coverage and composition , 2014, Nature Methods.

[6]  Cindy J. Castelle,et al.  Major New Microbial Groups Expand Diversity and Alter our Understanding of the Tree of Life , 2018, Cell.

[7]  Donovan H. Parks,et al.  Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life , 2017, Nature Microbiology.

[8]  P. Galand,et al.  Ultrarare marine microbes contribute to key sulphur‐related ecosystem functions , 2018, Molecular ecology.

[9]  Zhaojun Bai,et al.  CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads , 2007, RECOMB.

[10]  Brian C. Thomas,et al.  Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization , 2013, Genome research.

[11]  Morris A. Swertz,et al.  Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity , 2016, Science.

[12]  P. Hugenholtz,et al.  Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes , 2013, Nature Biotechnology.

[13]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[14]  Xiaowei Zhang,et al.  Where less may be more: how the rare biosphere pulls ecosystems strings , 2017, The ISME Journal.

[15]  Robert D. Finn,et al.  A new genomic blueprint of the human gut microbiota , 2019, Nature.

[16]  Feng Li,et al.  MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies , 2019, PeerJ.

[17]  Adam M. Phillippy,et al.  MUMmer4: A fast and versatile genome alignment system , 2018, PLoS Comput. Biol..

[18]  Matteo Comin,et al.  MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures , 2016, Bioinform..

[19]  Katherine S. Pollard,et al.  New insights from uncultivated genomes of the global human gut microbiome , 2019, Nature.

[20]  Ting Chen,et al.  COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO‐alignment and paired‐end read LinkAge , 2016, Bioinform..

[21]  Rob Knight,et al.  ConStrains identifies microbial strains in metagenomic datasets , 2015, Nature Biotechnology.

[22]  Le Vinh,et al.  A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads , 2015, Algorithms for Molecular Biology.

[23]  Miguel A. Boland,et al.  Redefining a new genomic blueprint of the human gut microbiota , 2019, Access Microbiology.

[24]  Blair D. Sullivan,et al.  Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity , 2018 .

[25]  Paul Medvedev,et al.  Compacting de Bruijn graphs from sequencing data quickly and in low memory , 2016, Bioinform..

[26]  Yu-Wei Wu,et al.  A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples , 2010, RECOMB.

[27]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[28]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[29]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[30]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.

[31]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[32]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[33]  Arend Hintze,et al.  Scaling metagenome sequence assembly with probabilistic de Bruijn graphs , 2011, Proceedings of the National Academy of Sciences.

[34]  J. Miller,et al.  Paradoxes in leaky microbial trade , 2016, Nature Communications.

[35]  M. Dunn,et al.  A human gut bacterial genome and culture collection for improved metagenomic analyses , 2019, Nature Biotechnology.

[36]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[37]  Anestis Gkanogiannis,et al.  A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes , 2016, BMC Bioinformatics.

[38]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[39]  S. Tringe,et al.  MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm , 2014, Microbiome.

[40]  Alexander Sczyrba,et al.  AMBER: Assessment of Metagenome BinnERs , 2017, bioRxiv.

[41]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[42]  Daniel Falush,et al.  MetaPalette: a k-mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation , 2016, mSystems.

[43]  Jacquelynn Benjamino,et al.  Low-abundant bacteria drive compositional changes in the gut microbiota after dietary alteration , 2018, Microbiome.