Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products

We developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end. The number of reads generated permitted saturating or near-saturating analysis of samples of the vaginal microbiome. The large number of reads allowed an in-depth analysis of errors, and we found that PCR-induced errors composed the vast majority of non-organism derived species variants, an observation that has significant implications for sequence clustering of similar high-throughput data. We show that the short reads are sufficient to assign organisms to the genus or species level in most cases. We suggest that this method will be useful for the deep sequencing of any short nucleotide region that is taxonomically informative; these include the V3, V5 regions of the bacterial 16S rRNA genes and the eukaryotic V9 region that is gaining popularity for sampling protist diversity.

[1]  Gregory B. Gloor,et al.  Deep Sequencing of the Vaginal Microbiota of Women with HIV , 2010, PloS one.

[2]  Sallie W. Chisholm,et al.  Unlocking Short Read Sequencing for Metagenomics , 2010, PloS one.

[3]  William A. Walters,et al.  Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample , 2010, Proceedings of the National Academy of Sciences.

[4]  P. Gajer,et al.  Vaginal microbiome of reproductive-age women , 2010, Proceedings of the National Academy of Sciences.

[5]  J. Pawłowski,et al.  Short rDNA Barcodes for Species Identification in Foraminifera , 2010, The Journal of eukaryotic microbiology.

[6]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[7]  Thomas L. Madden,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[8]  D. Frank BARCRAWL and BARTAB: software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing , 2009, BMC Bioinformatics.

[9]  P. Qian,et al.  Conservative Fragments in Bacterial 16S rRNA Genes and Primer Design for 16S Ribosomal DNA Amplicons in Metagenomic Studies , 2009, PloS one.

[10]  Daniel J. G. Lahr,et al.  Reducing the impact of PCR-mediated recombination in molecular evolution and environmental studies using a new-generation high-fidelity DNA polymerase. , 2009, BioTechniques.

[11]  Rob Knight,et al.  The 'rare biosphere': a reality check , 2009, Nature Methods.

[12]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[13]  Susan M. Huse,et al.  A Method for Studying Protistan Diversity Using Massively Parallel Sequencing of V9 Hypervariable Regions of Small-Subunit Ribosomal RNA Genes , 2009, PloS one.

[14]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[15]  Irina I. Abnizova,et al.  Swift: primary data analysis for the Illumina Solexa sequencing platform , 2009, Bioinform..

[16]  M. Hamady,et al.  Pyrosequencing-Based Assessment of Soil pH as a Predictor of Soil Bacterial Community Structure at the Continental Scale , 2009, Applied and Environmental Microbiology.

[17]  Jianqian Tong,et al.  Preliminary characterization of vaginal microbiota in healthy Chinese women using cultivation‐independent methods , 2009, The journal of obstetrics and gynaecology research.

[18]  Ruth Ann Luna,et al.  Metagenomic pyrosequencing and microbial identification. , 2009, Clinical chemistry.

[19]  T. Ball,et al.  Pyrosequencing of the Chaperonin-60 Universal Target as a Tool for Determining Microbial Community Composition , 2009, Applied and Environmental Microbiology.

[20]  David N. Fredricks,et al.  The Human Vaginal Bacterial Biota and Bacterial Vaginosis , 2009, Interdisciplinary perspectives on infectious diseases.

[21]  P. Polymenakou,et al.  Phylogenetic diversity of sediment bacteria from the southern Cretan margin, Eastern Mediterranean Sea. , 2009, Systematic and applied microbiology.

[22]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[23]  Susan M. Huse,et al.  Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing , 2008, PLoS genetics.

[24]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[25]  Gabor T. Marth,et al.  Rapid whole-genome mutational profiling using next-generation sequencing technologies. , 2008, Genome research.

[26]  Anders F. Andersson,et al.  Comparative Analysis of Human Gut Microbiota by Barcoded Pyrosequencing , 2008, PloS one.

[27]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[28]  A. Nocker,et al.  Genotypic Microbial Community Profiling: A Critical Technical Review , 2007, Microbial Ecology.

[29]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[30]  W R Engels,et al.  Contributing software to the internet: the Amplify program. , 1993, Trends in biochemical sciences.

[31]  A. Chao,et al.  Estimating the Number of Classes via Sample Coverage , 1992 .

[32]  G. Belle,et al.  Nonparametric estimation of species richness , 1984 .

[33]  B. Efron Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods , 1981 .

[34]  S. Hurlbert The Nonconcept of Species Diversity: A Critique and Alternative Parameters. , 1971, Ecology.

[35]  Rk Colwell EstimateS : Statistical estimation of species richness and shared species from samples, v. 8.0. User's guide and application , 2005 .

[36]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[37]  A. Chao Nonparametric estimation of the number of classes in a population , 1984 .