Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments

Abstract Background Taxonomic profiling of ribosomal RNA (rRNA) sequences has been the accepted norm for inferring the composition of complex microbial ecosystems. Quantitative Insights Into Microbial Ecology (QIIME) and mothur have been the most widely used taxonomic analysis tools for this purpose, with MAPseq and QIIME 2 being two recently released alternatives. However, no independent and direct comparison between these four main tools has been performed. Here, we compared the default classifiers of MAPseq, mothur, QIIME, and QIIME 2 using synthetic simulated datasets comprised of some of the most abundant genera found in the human gut, ocean, and soil environments. We evaluate their accuracy when paired with both different reference databases and variable sub-regions of the 16S rRNA gene. Findings We show that QIIME 2 provided the best recall and F-scores at genus and family levels, together with the lowest distance estimates between the observed and simulated samples. However, MAPseq showed the highest precision, with miscall rates consistently <2%. Notably, QIIME 2 was the most computationally expensive tool, with CPU time and memory usage almost 2 and 30 times higher than MAPseq, respectively. Using the SILVA database generally yielded a higher recall than using Greengenes, while assignment results of different 16S rRNA variable sub-regions varied up to 40% between samples analysed with the same pipeline. Conclusions Our results support the use of either QIIME 2 or MAPseq for optimal 16S rRNA gene profiling, and we suggest that the choice between the two should be based on the level of recall, precision, and/or computational performance required.

[1]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[2]  Philip D. Blood,et al.  Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software , 2017, Nature Methods.

[3]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[4]  Patrick D. Schloss,et al.  Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies , 2011, PloS one.

[5]  Rick L. Stevens,et al.  A communal catalogue reveals Earth’s multiscale microbial diversity , 2017, Nature.

[6]  C. Quince,et al.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. , 2013, Environmental microbiology.

[7]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[8]  N. Segata,et al.  Shotgun metagenomics, from sampling to analysis , 2017, Nature Biotechnology.

[9]  Jennifer M. Fettweis,et al.  The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies , 2015, BMC Microbiology.

[10]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[11]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[12]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[13]  Patrick D. Schloss,et al.  Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system , 2016, PeerJ.

[14]  Robert D. Finn,et al.  EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies , 2017, Nucleic Acids Res..

[15]  J. Palmer,et al.  Investigating Deep Phylogenetic Relationships among Cyanobacteria and Plastids by Small Subunit rRNA Sequence Analysis 1 , 1999, The Journal of eukaryotic microbiology.

[16]  N. Fierer Embracing the unknown: disentangling the complexities of the soil microbiome , 2017, Nature Reviews Microbiology.

[17]  N. Pace,et al.  The Analysis of Natural Microbial Populations by Ribosomal RNA Sequences , 1986 .

[18]  G. Van Domselaar,et al.  The Gut Microbiota in Immune-Mediated Inflammatory Diseases , 2016, Front. Microbiol..

[19]  Rafael A. Irizarry,et al.  Meta-analysis of gut microbiome studies identifies disease-specific and shared responses , 2017, Nature Communications.

[20]  Jonathan L. Golob,et al.  Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities , 2017, BMC Bioinformatics.

[21]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[22]  Paul P. Gardner,et al.  An evaluation of the accuracy and speed of metagenome analysis tools , 2015, Scientific Reports.

[23]  D. Huson,et al.  SILVA, RDP, Greengenes, NCBI and OTT — how do these taxonomies compare? , 2017, BMC Genomics.

[24]  Pelin Yilmaz,et al.  Expanding the World of Marine Bacterial and Archaeal Clades , 2016, Front. Microbiol..

[25]  G. Douglas,et al.  Microbiome Helper: a Custom and Streamlined Workflow for Microbiome Research , 2017, mSystems.

[26]  C. Elwood,et al.  Design and Performance of a 16S rRNA-Targeted Oligonucleotide Probe for Detection of Members of the Genus Bdellovibrio by Fluorescence In Situ Hybridization , 2007, Applied and Environmental Microbiology.

[27]  Sarah L. Westcott,et al.  Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform , 2013, Applied and Environmental Microbiology.

[28]  M. Watson,et al.  The Madness of Microbiome: Attempting To Find Consensus “Best Practice” for 16S Microbiome Studies , 2018, Applied and Environmental Microbiology.

[29]  Christian von Mering,et al.  MAPseq: highly efficient k-mer search with confidence estimates, for rRNA sequence analysis , 2017, Bioinform..

[30]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[31]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[32]  Elizabeth M Glass,et al.  MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function. , 2016, Methods in molecular biology.

[33]  Graziano Pesole,et al.  BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS , 2015, BMC Bioinformatics.

[34]  Rob Knight,et al.  Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin , 2018, Microbiome.

[35]  Yong Wang,et al.  Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis , 2016, BMC Bioinformatics.

[36]  Eric P. Nawrocki,et al.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea , 2011, The ISME Journal.

[37]  A. Klindworth,et al.  Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies , 2012, Nucleic acids research.

[38]  Karen P. Scott,et al.  16S rRNA gene-based profiling of the human infant gut microbiota is strongly influenced by sample processing and PCR primer choice , 2015, Microbiome.

[39]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.