High-quality genome sequences of uncultured microbes by assembly of read clouds

Although shotgun metagenomic sequencing of microbiome samples enables partial reconstruction of strain-level community structure, obtaining high-quality microbial genome drafts without isolation and culture remains difficult. Here, we present an application of read clouds, short-read sequences tagged with long-range information, to microbiome samples. We present Athena, a de novo assembler that uses read clouds to improve metagenomic assemblies. We applied this approach to sequence stool samples from two healthy individuals and compared it with existing short-read and synthetic long-read metagenomic sequencing techniques. Read-cloud metagenomic sequencing and Athena assembly produced the most comprehensive individual genome drafts with high contiguity (>200-kb N50, fewer than ten contigs), even for bacteria with relatively low (20×) raw short-read-sequence coverage. We also sequenced a complex marine-sediment sample and generated 24 intermediate-quality genome drafts (>70% complete, <10% contaminated), nine of which were complete (>90% complete, <5% contaminated). Our approach allows for culture-free generation of high-quality microbial genome drafts by using a single shotgun experiment.

[1]  Pavel A Pevzner,et al.  TruSPAdes: barcode assembly of TruSeq synthetic long reads , 2016, Nature Methods.

[2]  Natalia N. Ivanova,et al.  Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea , 2017, Nature Biotechnology.

[3]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[4]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[5]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[6]  Tatiana A. Tatusova,et al.  RefSeq microbial genomes database: new representation and annotation strategy , 2013, Nucleic Acids Res..

[7]  Serafim Batzoglou,et al.  Genome-wide reconstruction of complex structural variants using read clouds , 2016, Nature Methods.

[8]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[9]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[10]  Andrew C. Adey,et al.  Haplotype-resolved genome sequencing of a Gujarati Indian individual , 2011, Nature Biotechnology.

[11]  Meng Li,et al.  Genomic and enzymatic evidence for acetogenesis among multiple lineages of the archaeal phylum Bathyarchaeota widespread in marine sediments , 2016, Nature Microbiology.

[12]  Jens Roat Kultima,et al.  Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes , 2014, Nature Biotechnology.

[13]  Arthur Brady,et al.  Strains, functions and dynamics in the expanded Human Microbiome Project , 2017, Nature.

[14]  Andrew C. Adey,et al.  Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing , 2014, Nature Genetics.

[15]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[16]  Jessica C. Ebert,et al.  Accurate whole genome sequencing and haplotyping from10-20 human cells , 2012, Nature.

[17]  Brian C. Thomas,et al.  A new view of the tree of life , 2016, Nature Microbiology.

[18]  N. Kashtan,et al.  Single-Cell Genomics Reveals Hundreds of Coexisting Subpopulations in Wild Prochlorococcus , 2014, Science.

[19]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[20]  Jo Handelsman,et al.  Metagenomics for studying unculturable microorganisms: cutting the Gordian knot , 2005, Genome Biology.

[21]  Yasubumi Sakakibara,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2012, Nucleic acids research.

[22]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[23]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[24]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[25]  Alexey A. Gurevich,et al.  MetaQUAST: evaluation of metagenome assemblies , 2016, Bioinform..

[26]  S. Tringe,et al.  MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm , 2014, Microbiome.

[27]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[28]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[29]  Michael Snyder,et al.  Synthetic long read sequencing reveals the composition and intraspecies diversity of the human microbiome , 2015, Nature Biotechnology.

[30]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[31]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.

[32]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[33]  Brian C. Thomas,et al.  Accurate, multi-kb reads resolve complex populations and detect rare microorganisms , 2015, Genome research.

[34]  B. Baker,et al.  Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria , 2015, Microbiome.

[35]  Daniel E. Newburger,et al.  Read clouds uncover variation in complex regions of the human genome. , 2015, Genome research.

[36]  Anders F. Andersson,et al.  Binning metagenomic contigs by coverage and composition , 2014, Nature Methods.

[37]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[38]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[39]  Hendrik Schäfer,et al.  SIP metagenomics identifies uncultivated Methylophilaceae as dimethylsulphide degrading bacteria in soil and lake sediment , 2015, The ISME Journal.

[40]  Dean Laslett,et al.  ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. , 2004, Nucleic acids research.

[41]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[42]  S. Batzoglou,et al.  GATTACA: Lightweight Metagenomic Binning with Compact Indexing of Kmer Counts and MinHash-based Panel Selection , 2017, bioRxiv.

[43]  Aaron M. Newman,et al.  The genome sequence of the colonial chordate, Botryllus schlosseri , 2013, eLife.

[44]  Pavel A. Pevzner,et al.  Assembly of long error-prone reads using de Bruijn graphs , 2016, Proceedings of the National Academy of Sciences.

[45]  Katherine H. Huang,et al.  Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning , 2015, Nature Biotechnology.

[46]  Sarah J. Fansler,et al.  Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes , 2016, mSystems.

[47]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[48]  Brian C. Thomas,et al.  Unusual biology across a group comprising more than 15% of domain Bacteria , 2015, Nature.

[49]  Olli Simell,et al.  The methylome of the gut microbiome: disparate Dam methylation patterns in intestinal Bacteroides dorei , 2014, Front. Microbiol..

[50]  Dennis A. Benson,et al.  GenBank , 2012, Nucleic acids research.