Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rRNA Gene Sequence Analysis

ABSTRACT In spite of technical advances that have provided increases in orders of magnitude in sequencing coverage, microbial ecologists still grapple with how to interpret the genetic diversity represented by the 16S rRNA gene. Two widely used approaches put sequences into bins based on either their similarity to reference sequences (i.e., phylotyping) or their similarity to other sequences in the community (i.e., operational taxonomic units [OTUs]). In the present study, we investigate three issues related to the interpretation and implementation of OTU-based methods. First, we confirm the conventional wisdom that it is impossible to create an accurate distance-based threshold for defining taxonomic levels and instead advocate for a consensus-based method of classifying OTUs. Second, using a taxonomic-independent approach, we show that the average neighbor clustering algorithm produces more robust OTUs than other hierarchical and heuristic clustering algorithms. Third, we demonstrate several steps to reduce the computational burden of forming OTUs without sacrificing the robustness of the OTU assignment. Finally, by blending these solutions, we propose a new heuristic that has a minimal effect on the robustness of OTUs and significantly reduces the necessary time and memory requirements. The ability to quickly and accurately assign sequences to OTUs and then obtain taxonomic information for those OTUs will greatly improve OTU-based analyses and overcome many of the challenges encountered with phylotype-based methods.

[1]  E. Stackebrandt,et al.  Nucleic acid techniques in bacterial systematics , 1991 .

[2]  P. Vandamme,et al.  DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. , 2007, International journal of systematic and evolutionary microbiology.

[3]  J. Handelsman,et al.  Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness , 2005, Applied and Environmental Microbiology.

[4]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[5]  A. J. Jones,et al.  At Least 1 in 20 16S rRNA Sequence Records Currently Held in Public Repositories Is Estimated To Contain Substantial Anomalies , 2005, Applied and Environmental Microbiology.

[6]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[7]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[8]  William G. Mckendree,et al.  ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences , 2009, Nucleic acids research.

[9]  D. M. Ward A natural species concept for prokaryotes. , 1998, Current opinion in microbiology.

[10]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[11]  Philip Hugenholtz,et al.  Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity , 1998, Journal of bacteriology.

[12]  F. Cohan What are bacterial species? , 2002, Annual review of microbiology.

[13]  Susan M. Huse,et al.  Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing , 2008, PLoS genetics.

[14]  P. Schloss A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies , 2009, PloS one.

[15]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[16]  R. Knight,et al.  Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers , 2008, Nucleic acids research.

[17]  Patrick D. Schloss,et al.  The Effects of Alignment Quality, Distance Calculation Method, Sequence Filtering, and Region on the Analysis of 16S rRNA Gene-Based Studies , 2010, PLoS Comput. Biol..

[18]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[19]  D. Lane 16S/23S rRNA sequencing , 1991 .

[20]  Erko Stackebrandt,et al.  Taxonomic Note: A Place for DNA-DNA Reassociation and 16S rRNA Sequence Analysis in the Present Species Definition in Bacteriology , 1994 .

[21]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[22]  Thomas Huber,et al.  Bellerophon: a program to detect chimeric sequences in multiple sequence alignments , 2004, Bioinform..

[23]  K. Konstantinidis,et al.  The bacterial species definition in the genomic era , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[24]  Susan M. Huse,et al.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering , 2010, Environmental microbiology.

[25]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.