BOTUX: Bayesian-like operational taxonomic unit examiner

Bayesian-like operational taxonomic unit examiner (BOTUX) is a new tool for the classification of 16S rRNA gene sequences into operational taxonomic units (OTUs) that addresses the problem of overestimation caused by errors introduced during PCR amplification and DNA sequencing steps. BOTUX utilises a grammar-based assignment strategy, where Bayesian models are built from each word of a given length (e.g., 8-mers). de novo analysis is possible with BOTUX as it does not require a training set, and updates probabilistic models as new sequences are recruited to an OTU. In benchmarking tests performed with real and simulated datasets of 16S rDNA sequences, BOTUX accurately identifies OTUs with comparable or better clustering efficiency and lower execution times than other OTU algorithms tested. BOTUX is the only OTU classifier, which allows incremental analysis of large datasets, and is also adept in clustering both 454 and Illumina datasets in a reasonable timeframe.

[1]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[2]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[3]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[4]  Seth B. Roberts,et al.  The Vaginal Microbiome: Disease, Genetics and the Environment , 2010, Nature Precedings.

[5]  R. Knight,et al.  Rapid denoising of pyrosequencing amplicon data: exploiting the rank-abundance distribution , 2010, Nature Methods.

[6]  Marcus J. Claesson,et al.  Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions , 2010, Nucleic acids research.

[7]  Khalid Sayood,et al.  A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences , 2010, BMC Bioinformatics.

[8]  Patrick D. Schloss,et al.  Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies , 2011, PloS one.

[9]  D. Huson,et al.  Analysis of 16S rRNA environmental sequences using MEGAN , 2011, BMC Genomics.

[10]  Jennifer M. Fettweis,et al.  Species-level classification of the vaginal microbiome , 2012, BMC Genomics.

[11]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[12]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[13]  Yuzhen Ye,et al.  Identification and quantification of abundant species from pyrosequences of 16S rRNA by consensus alignment , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[14]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[15]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[16]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[17]  Erik S. Wright,et al.  DECIPHER, a Search-Based Approach to Chimera Identification for 16S rRNA Sequences , 2011, Applied and Environmental Microbiology.

[18]  Susan M. Huse,et al.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering , 2010, Environmental microbiology.

[19]  Susan M. Huse,et al.  Accuracy and quality of massively parallel DNA pyrosequencing , 2007, Genome Biology.

[20]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[21]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..