A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling

BackgroundIn the last 5 years, the rapid pace of innovations and improvements in sequencing technologies has completely changed the landscape of metagenomic and metagenetic experiments. Therefore, it is critical to benchmark the various methodologies for interrogating the composition of microbial communities, so that we can assess their strengths and limitations. The most common phylogenetic marker for microbial community diversity studies is the 16S ribosomal RNA gene and in the last 10 years the field has moved from sequencing a small number of amplicons and samples to more complex studies where thousands of samples and multiple different gene regions are interrogated.ResultsWe assembled 2 synthetic communities with an even (EM) and uneven (UM) distribution of archaeal and bacterial strains and species, as metagenomic control material, to assess performance of different experimental strategies. The 2 synthetic communities were used in this study, to highlight the limitations and the advantages of the leading sequencing platforms: MiSeq (Illumina), The Pacific Biosciences RSII, 454 GS-FLX/+ (Roche), and IonTorrent (Life Technologies). We describe an extensive survey based on synthetic communities using 3 experimental designs (fusion primers, universal tailed tag, ligated adaptors) across the 9 hypervariable 16S rDNA regions. We demonstrate that library preparation methodology can affect data interpretation due to different error and chimera rates generated during the procedure. The observed community composition was always biased, to a degree that depended on the platform, sequenced region and primer choice. However, crucially, our analysis suggests that 16S rRNA sequencing is still quantitative, in that relative changes in abundance of taxa between samples can be recovered, despite these biases.ConclusionWe have assessed a range of experimental conditions across several next generation sequencing platforms using the most up-to-date configurations. We propose that the choice of sequencing platform and experimental design needs to be taken into consideration in the early stage of a project by running a small trial consisting of several hypervariable regions to quantify the discriminatory power of each region. We also suggest that the use of a synthetic community as a positive control would be beneficial to identify the potential biases and procedural drawbacks that may lead to data misinterpretation. The results of this study will serve as a guideline for making decisions on which experimental condition and sequencing platform to consider to achieve the best microbial profiling.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  Y Van de Peer,et al.  A quantitative map of nucleotide substitution rates in bacterial rRNA. , 1996, Nucleic acids research.

[3]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[4]  F. Bushman,et al.  Short pyrosequencing reads suffice for accurate microbial community analysis , 2007, Nucleic acids research.

[5]  C. Criddle,et al.  Gene capture and random amplification for quantitative recovery of homologous genes. , 2007, Molecular and cellular probes.

[6]  Jonathan P. Bollback,et al.  The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing , 2007, PloS one.

[7]  U. Stenzel,et al.  Targeted high-throughput sequencing of tagged nucleic acid samples , 2007, Nucleic acids research.

[8]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[9]  U. Stenzel,et al.  Parallel tagged sequencing on the 454 platform , 2008, Nature Protocols.

[10]  Philip Hugenholtz,et al.  A renaissance for the pioneering 16S rRNA gene. , 2008, Current opinion in microbiology.

[11]  Daniel J. G. Lahr,et al.  Reducing the impact of PCR-mediated recombination in molecular evolution and environmental studies using a new-generation high-fidelity DNA polymerase. , 2009, BioTechniques.

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[14]  D. Coltman,et al.  Depauperate genetic variability detected in the American and European bison using genomic techniques , 2009, Biology Direct.

[15]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[16]  R. Knight,et al.  Rapid denoising of pyrosequencing amplicon data: exploiting the rank-abundance distribution , 2010, Nature Methods.

[17]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[18]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[19]  Carsten O. Daub,et al.  SAMStat: monitoring biases in next generation sequencing data , 2010, Bioinform..

[20]  Mark J. Clement,et al.  Targeted Amplicon Sequencing (TAS): A Scalable Next-Gen Approach to Multilocus, Multitaxa Phylogenetics , 2011, Genome biology and evolution.

[21]  William A. Walters,et al.  Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms , 2012, The ISME Journal.

[22]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[23]  Rob Knight,et al.  Comparison of Illumina paired-end and single-direction sequencing for microbial 16S rRNA gene amplicon surveys , 2011, The ISME Journal.

[24]  L. Raskin,et al.  PCR Biases Distort Bacterial and Archaeal Community Structure in Pyrosequencing Datasets , 2012, PloS one.

[25]  Kyungtaek Lim,et al.  Large Variations in Bacterial Ribosomal RNA Genes , 2012, Molecular biology and evolution.

[26]  Martin Kircher,et al.  Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform , 2011, Nucleic acids research.

[27]  David J. Studholme,et al.  Considerations for the development and application of control materials to improve metagenomic microbial community profiling , 2013, Accreditation and Quality Assurance.

[28]  Rob Knight,et al.  Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences , 2012, The ISME Journal.

[29]  A. Klindworth,et al.  Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies , 2012, Nucleic acids research.

[30]  C. Quince,et al.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. , 2013, Environmental microbiology.

[31]  P. Mieczkowski,et al.  Practical innovations for high-throughput amplicon sequencing , 2013, Nature Methods.

[32]  Natalia N. Ivanova,et al.  Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.

[33]  Angela Sessitsch,et al.  The Effect of Primer Choice and Short Read Sequences on the Outcome of 16S rRNA Gene Based Diversity Studies , 2013, PloS one.

[34]  W. Hanage Microbiology: Microbiome science needs a healthy dose of scepticism , 2014, Nature.

[35]  Sharon L. Grim,et al.  Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys , 2014, PloS one.

[36]  Orkun S. Soyer,et al.  Synthetic microbial communities , 2014, Current opinion in microbiology.

[37]  David J. Studholme,et al.  Assessing the Accuracy of Quantitative Molecular Microbial Profiling , 2014, International journal of molecular sciences.

[38]  C. Thermes,et al.  Library preparation methods for next-generation sequencing: tone down the bias. , 2014, Experimental cell research.

[39]  C. Quince,et al.  Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform , 2015, Nucleic acids research.