Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns

Deblur provides a rapid and sensitive means to assess ecological patterns driven by differentiation of closely related taxa. This algorithm provides a solution to the problem of identifying real ecological differences between taxa whose amplicons differ by a single base pair, is applicable in an automated fashion to large-scale sequencing data sets, and can integrate sequencing runs collected over time. ABSTRACT High-throughput sequencing of 16S ribosomal RNA gene amplicons has facilitated understanding of complex microbial communities, but the inherent noise in PCR and DNA sequencing limits differentiation of closely related bacteria. Although many scientific questions can be addressed with broad taxonomic profiles, clinical, food safety, and some ecological applications require higher specificity. Here we introduce a novel sub-operational-taxonomic-unit (sOTU) approach, Deblur, that uses error profiles to obtain putative error-free sequences from Illumina MiSeq and HiSeq sequencing platforms. Deblur substantially reduces computational demands relative to similar sOTU methods and does so with similar or better sensitivity and specificity. Using simulations, mock mixtures, and real data sets, we detected closely related bacterial sequences with single nucleotide differences while removing false positives and maintaining stability in detection, suggesting that Deblur is limited only by read length and diversity within the amplicon sequences. Because Deblur operates on a per-sample level, it scales to modern data sets and meta-analyses. To highlight Deblur’s ability to integrate data sets, we include an interactive exploration of its application to multiple distinct sequencing rounds of the American Gut Project. Deblur is open source under the Berkeley Software Distribution (BSD) license, easily installable, and downloadable from https://github.com/biocore/deblur . IMPORTANCE Deblur provides a rapid and sensitive means to assess ecological patterns driven by differentiation of closely related taxa. This algorithm provides a solution to the problem of identifying real ecological differences between taxa whose amplicons differ by a single base pair, is applicable in an automated fashion to large-scale sequencing data sets, and can integrate sequencing runs collected over time.

[1]  Nicholas A. Bokulich,et al.  mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking , 2016, mSystems.

[2]  Ben Nichols,et al.  VSEARCH: a versatile open source tool for metagenomics , 2016, PeerJ.

[3]  Robert C. Edgar,et al.  UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing , 2016, bioRxiv.

[4]  Paul J. McMurdie,et al.  DADA2: High resolution sample inference from Illumina amplicon data , 2016, Nature Methods.

[5]  K. Nelson,et al.  Phylogenetic and ecological factors impact the gut microbiota of two Neotropical primate species , 2015, Oecologia.

[6]  M. Sogin,et al.  Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences , 2014, The ISME Journal.

[7]  Falk Hildebrand,et al.  Erratum to: LotuS: an efficient and user-friendly OTU processing pipeline , 2014, Microbiome.

[8]  Rob Knight,et al.  The Earth Microbiome project: successes and aspirations , 2014, BMC Biology.

[9]  Antonio Gonzalez,et al.  Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences , 2014, PeerJ.

[10]  Konrad U. Förstner,et al.  READemption – A tool for the computational analysis of deep-sequencing-based transcriptome data , 2014, bioRxiv.

[11]  Rob Knight,et al.  EMPeror: a tool for visualizing high-throughput microbial community data , 2013, GigaScience.

[12]  Andreas Wilke,et al.  The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome , 2012, GigaScience.

[13]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[14]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.

[15]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[16]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[17]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[18]  Les Dethlefsen,et al.  The Pervasive Effects of an Antibiotic on the Human Gut Microbiota, as Revealed by Deep 16S rRNA Sequencing , 2008, PLoS biology.

[19]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[20]  J. Handelsman,et al.  Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness , 2005, Applied and Environmental Microbiology.

[21]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.