Bacterial Community Reconstruction Using Compressed Sensing

Bacteria are the unseen majority on our planet, with millions of species and comprising most of the living protoplasm. We propose a novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture. Our method is based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number of measurements. Utilizing the fact that in many cases each bacterial community is comprised of a small subset of all known bacterial species, we show the feasibility of this approach for determining the composition of a bacterial mixture. Using simulations, we show that sequencing a few hundred base-pairs of the 16S rRNA gene sequence may provide enough information for reconstruction of mixtures containing tens of species, out of tens of thousands, even in the presence of realistic measurement noise. Finally, we show initial promising results when applying our method for the reconstruction of a toy experimental mixture with five species. Our approach may have a potential for a simple and efficient way for identifying bacterial species compositions in biological samples. All supplementary data and the MATLAB code are available at www.broadinstitute.org/?orzuk/publications/BCS/.

[1]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[2]  N. Shental,et al.  Identification of rare alleles and their carriers using compressed se(que)nsing , 2011, Nucleic Acids Research.

[3]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[4]  F. Dewhirst,et al.  Bacterial Diversity in Human Subgingival Plaque , 2001, Journal of bacteriology.

[5]  R. Knight,et al.  Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex , 2008, Nature Methods.

[6]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[7]  M. Achtman,et al.  Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Bernard Henrissat,et al.  Muegge Mammalian Phylogeny and Within Humans Diet Drives Convergence in Gut Microbiome Functions Across , 2011 .

[9]  A. Halpern,et al.  The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific , 2007, PLoS biology.

[10]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[11]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[12]  Raghunandan M Kainkaryam,et al.  Pooling in high-throughput drug screening. , 2009, Current opinion in drug discovery & development.

[13]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[14]  Jizhong Zhou,et al.  Microarray Applications in Microbial Ecology Research , 2006, Microbial Ecology.

[15]  E. Purdom,et al.  Diversity of the Human Intestinal Microbial Flora , 2005, Science.

[16]  P. Hugenholtz Exploring prokaryotic diversity in the genomic era , 2002, Genome Biology.

[17]  Ting Sun,et al.  Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[18]  Jean-Luc Starck,et al.  Compressed Sensing in Astronomy , 2008, IEEE Journal of Selected Topics in Signal Processing.

[19]  M. Posner,et al.  The salivary microbiota as a diagnostic indicator of oral cancer: A descriptive, non-randomized study of cancer-free and oral squamous cell carcinoma subjects , 2005, Journal of Translational Medicine.

[20]  B. McMurray,et al.  Speaker variability augments phonological processing in early word learning. , 2009, Developmental science.

[21]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[22]  E.J. Candes Compressive Sampling , 2022 .

[23]  Les Dethlefsen,et al.  The Pervasive Effects of an Antibiotic on the Human Gut Microbiota, as Revealed by Deep 16S rRNA Sequencing , 2008, PLoS biology.

[24]  Cynthia L Sears,et al.  A dynamic partnership: celebrating our gut flora. , 2005, Anaerobe.

[25]  M. Blaser,et al.  Molecular analysis of human forearm superficial skin bacterial biota , 2007, Proceedings of the National Academy of Sciences.

[26]  F. Herrmann,et al.  Compressed wavefield extrapolation , 2007 .

[27]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[28]  Peter Millard,et al.  Unravelling rhizosphere-microbial interactions: opportunities and limitations. , 2004, Trends in microbiology.

[29]  K. Zengler,et al.  Tapping into microbial diversity , 2004, Nature Reviews Microbiology.

[30]  J. Faith,et al.  Predicting a Human Gut Microbiota’s Response to Diet in Gnotobiotic Mice , 2011, Science.

[31]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[32]  K. Schleifer,et al.  The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains. , 2008, Systematic and applied microbiology.

[33]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[34]  Eoin L. Brodie,et al.  Urban aerosols harbor diverse and dynamic bacterial populations , 2007, Proceedings of the National Academy of Sciences.

[35]  J. Clemente,et al.  Diet Drives Convergence in Gut Microbiome Functions Across Mammalian Phylogeny and Within Humans , 2011, Science.

[36]  Julian Parkhill,et al.  Microbiology in the post-genomic era , 2008, Nature Reviews Microbiology.

[37]  C. Tibbetts,et al.  Neighboring nucleotide interactions during DNA sequencing gel electrophoresis. , 1991, Nucleic acids research.

[38]  Susan M. Huse,et al.  Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing , 2008, PLoS genetics.

[39]  Yonina C. Eldar,et al.  Coherence-Based Performance Guarantees for Estimating a Sparse Vector Under Random Noise , 2009, IEEE Transactions on Signal Processing.

[40]  D. Savage Microbial ecology of the gastrointestinal tract. , 1977, Annual review of microbiology.

[41]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[42]  F. Ausubel Phylogenetic identification and in situ detection of individual microbial cell without cultivation. , 2010 .

[43]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[44]  J. Izard,et al.  The Human Oral Microbiome , 2010, Journal of bacteriology.

[45]  F. Guarner,et al.  Gut flora in health and disease , 2003, The Lancet.

[46]  G. Hartzell,et al.  DNA sequence confidence estimation. , 1994, Genomics.

[47]  Michael Brand,et al.  Compressed Genotyping , 2009, IEEE Transactions on Information Theory.

[48]  K. Schleifer,et al.  Phylogenetic identification and in situ detection of individual microbial cells without cultivation. , 1995, Microbiological reviews.

[49]  Øyvind Kommedal,et al.  Analysis of Mixed Sequencing Chromatograms and Its Application in Direct 16S rRNA Gene Sequencing of Polymicrobial Samples , 2008, Journal of Clinical Microbiology.

[50]  D. Nickerson,et al.  PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. , 1997, Nucleic acids research.

[51]  Richard G. Baraniuk,et al.  Compressive Sensing DNA Microarrays , 2008, EURASIP J. Bioinform. Syst. Biol..

[52]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[53]  Fabrice Armougom,et al.  Use of pyrosequencing and DNA barcodes to monitor variations in Firmicutes and Bacteroidetes communities in the gut microbiota of obese humans , 2008, BMC Genomics.

[54]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[55]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.