Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products

Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of “binning” the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a simple synthetic metagenome sample to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species. The Hi-C data also reliably associated plasmids with the chromosomes of their host and with each other. We further demonstrated that Hi-C data provides a long-range signal of strain-specific genotypes, indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This metagenomic Hi-C method could facilitate future studies of the fine-scale population structure of microbes, as well as studies of how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities. The method is not limited to microbiology; the genetic architecture of other heterogeneous populations of cells could also be studied with this technique.

[1]  Jungsuk Kim,et al.  Recent advances in nanopore sequencing , 2012, Electrophoresis.

[2]  Andrew C. Adey,et al.  Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions , 2013, Nature Biotechnology.

[3]  P. Hugenholtz Exploring prokaryotic diversity in the genomic era , 2002, Genome Biology.

[4]  Marc A Marti-Renom,et al.  The Three-dimensional Architecture of a Bacterial Genome and Its Alteration by Genetic Perturbation , 2022 .

[5]  A. Tanay,et al.  Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture , 2011, Nature Genetics.

[6]  S. Dongen Graph clustering by flow simulation , 2000 .

[7]  Bing Ren,et al.  Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing , 2013, Nature Biotechnology.

[8]  Allan Konopka,et al.  What is microbial community ecology? , 2009, The ISME Journal.

[9]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[10]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2004, Microbiology and Molecular Biology Reviews.

[11]  S. Turner,et al.  Real-Time DNA Sequencing from Single Polymerase Molecules , 2009, Science.

[12]  J. T. Staley,et al.  Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. , 1985, Annual review of microbiology.

[13]  Alice Carolyn McHardy,et al.  Taxonomic binning of metagenome samples generated by next-generation sequencing technologies , 2012, Briefings Bioinform..

[14]  J. Gilbert,et al.  Microbial metagenomics: beyond the genome. , 2011, Annual review of marine science.

[15]  Marc A. Martí-Renom,et al.  The Three-Dimensional Architecture of a Bacterial Genome and Its Alteration by Genetic Perturbation , 2012, RECOMB.

[16]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[17]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[18]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[19]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[20]  Microbial toxins in the green world , 2013 .

[21]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[22]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[23]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[24]  Daniel D. Sommer,et al.  MetAMOS: a modular and open source metagenomic assembly and analysis pipeline , 2013, Genome Biology.

[25]  Alexander Sczyrba,et al.  Decontamination of MDA Reagents for Single Cell Whole Genome Amplification , 2011, PloS one.

[26]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.

[27]  S. Tringe,et al.  Tackling soil diversity with the assembly of large, complex metagenomes , 2014, Proceedings of the National Academy of Sciences.

[28]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[29]  P. Blainey The future is now: single-cell genomics of bacteria and archaea. , 2013, FEMS microbiology reviews.

[30]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..