Significant loss of sensitivity and specificity in the taxonomic classification occurs when short 16S rRNA gene sequences are used

The classification performance of Kraken was evaluated in terms of sensitivity and specificity when using short and long 16S rRNA sequences. A total of 440,738 sequences from bacteria with complete taxonomic classifications were downloaded from the high quality ribosomal RNA database SILVA. Amplicons produced (86,371 sequences; 1450 bp) by virtual PCR with primers covering the V1–V9 region of the 16S-rRNA gene were used as reference. Virtual PCŔs of internal fragments V3–V4, V4–V5 and V3–V5 were performed. A total of 81,523, 82,334 and 82,998 amplicons were obtained for regions V3–V4, V4–V5 and V3–V5 respectively. Differences in depth of taxonomic classification were detected among the internal fragments. For instance, sensitivity and specificity of sequences classified up to subspecies level were higher when the largest internal fraction (V3–V5) was used (54.0 and 74.6% respectively), compared to V3–V4 (45.1 and 66.7%) and V4–V5 (41.8 and 64.6%) fragments. Similar pattern was detected for sequences classified up to more superficial taxonomic categories (i.e. family, order, class…). Results also demonstrate that internal fragments lost specificity and some could be misclassified at the deepest taxonomic levels (i.e. species or subspecies). It is concluded that the larger V3–V5 fragment could be considered for massive high throughput sequencing reducing the loss of sensitivity and sensibility.

[1]  Rob Knight,et al.  Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences , 2012, The ISME Journal.

[2]  Qiong Wang,et al.  Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms , 2012, PloS one.

[3]  M. Martínez‐Porchas,et al.  Microbial metagenomics in aquaculture: a potential tool for a deeper insight into the activity , 2017 .

[4]  S. Clay,et al.  GC-clamp primer batches yield 16S rRNA gene amplicon pools with variable GC clamps, affecting denaturing gradient gel electrophoresis profiles. , 2010, FEMS microbiology letters.

[5]  Christina Boucher,et al.  Use of Metagenomic Shotgun Sequencing Technology To Detect Foodborne Pathogens within the Microbiome of the Beef Production Chain , 2016, Applied and Environmental Microbiology.

[6]  Yong Wang,et al.  Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis , 2016, BMC Bioinformatics.

[7]  F. Bushman,et al.  Short pyrosequencing reads suffice for accurate microbial community analysis , 2007, Nucleic acids research.

[8]  N. Pace,et al.  Differential amplification of rRNA genes by polymerase chain reaction , 1992, Applied and environmental microbiology.

[9]  Marcus J. Claesson,et al.  Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions , 2010, Nucleic acids research.

[10]  David J. Krause,et al.  Inferring Speciation Processes from Patterns of Natural Variation in Microbial Genomes , 2015, Systematic biology.

[11]  Paul P. Gardner,et al.  An evaluation of the accuracy and speed of metagenome analysis tools , 2015, Scientific Reports.

[12]  Naryttza N. Diaz,et al.  TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach , 2009, BMC Bioinformatics.

[13]  L. Steinbock,et al.  The emergence of nanopores in next-generation sequencing , 2015, Nanotechnology.

[14]  Mehrdad Hajibabaei,et al.  Next‐generation sequencing technologies for environmental DNA research , 2012, Molecular ecology.

[15]  A. Loukas,et al.  Deep sequencing approach for investigating infectious agents causing fever , 2016, European Journal of Clinical Microbiology & Infectious Diseases.

[16]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[17]  C. Thermes,et al.  Ten years of next-generation sequencing technology. , 2014, Trends in genetics : TIG.

[18]  Tong Zhang,et al.  Taxonomic Precision of Different Hypervariable Regions of 16S rRNA Gene and Annotation Methods for Functional Bacterial Groups in Biological Wastewater Treatment , 2013, PloS one.

[19]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[20]  Steven Salzberg,et al.  Bracken: Estimating species abundance in metagenomics data , 2016, bioRxiv.

[21]  H. Blöcker,et al.  Isolation and direct complete nucleotide determination of entire genes. Characterization of a gene coding for 16S ribosomal RNA. , 1989, Nucleic acids research.

[22]  L. Whyte,et al.  Characterization of the Prokaryotic Diversity in Cold Saline Perennial Springs of the Canadian High Arctic , 2007, Applied and Environmental Microbiology.

[23]  Shawn Rynearson,et al.  Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling , 2016, Genome Biology.

[24]  S. Lonardi,et al.  CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers , 2015, BMC Genomics.

[25]  A. Klindworth,et al.  Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies , 2012, Nucleic acids research.

[26]  Zhongtang Yu,et al.  Evaluation of different partial 16S rRNA gene sequence regions for phylogenetic analysis of microbiomes. , 2011, Journal of microbiological methods.

[27]  M. Martínez‐Porchas,et al.  Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken). , 2016, Journal of microbiological methods.

[28]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[29]  D. Cowan,et al.  Review and re-analysis of domain-specific 16S primers. , 2003, Journal of microbiological methods.

[30]  A. Uitterlinden,et al.  Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA , 1993, Applied and environmental microbiology.