Application of database-independent approach to assess the quality of OTU picking methods

Assigning 16S rRNA gene sequences to operational taxonomic units (OTUs) allows microbial ecologists to overcome the inconsistencies and biases within bacterial taxonomy and provides a strategy for clustering similar sequences that do not have representatives in a reference database. I have applied the Matthew’s correlation coefficient to assess the ability of 15 reference-independent and ‐dependent clustering algorithms to assign sequences to OTUs. This metric quantifies the ability of an algorithm to reflect the relationships between sequences without the use of a reference and can be applied to any dataset or method. The most consistently robust method was the average neighbor algorithm; however, for some datasets other algorithms matched its performance.

[1]  J. Petrosino,et al.  Stabilization of the murine gut microbiome following weaning , 2012, Gut microbes.

[2]  Christian von Mering,et al.  Limits to robustness and reproducibility in the demarcation of operational taxonomic units. , 2015, Environmental microbiology.

[3]  Hatice U. Osmanbeyoglu,et al.  N-gram analysis of 970 microbial organisms reveals presence of biological language models , 2011, BMC Bioinformatics.

[4]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[5]  Antonio Gonzalez,et al.  Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences , 2014, PeerJ.

[6]  Robert C. Edgar,et al.  UPARSE: highly accurate OTU sequences from microbial amplicon reads , 2013, Nature Methods.

[7]  Hélène Touzet,et al.  SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data , 2012, Bioinform..

[8]  Duccio Cavalieri,et al.  MICCA: a complete and accurate software for taxonomic profiling of metagenomic data , 2015, Scientific Reports.

[9]  Yongmei Cheng,et al.  A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs , 2013, PloS one.

[10]  Yunpeng Cai,et al.  ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time , 2011, Nucleic acids research.

[11]  Xiaoyu Wang,et al.  A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis , 2012, Briefings Bioinform..

[12]  Rob Knight,et al.  Open-Source Sequence Clustering Methods Improve the State Of the Art , 2016, mSystems.

[13]  William A. Walters,et al.  Erratum to: Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity , 2015, Microbiome.

[14]  Susan M. Huse,et al.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering , 2010, Environmental microbiology.

[15]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[16]  William G. Mckendree,et al.  ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences , 2009, Nucleic acids research.

[17]  Sarah L. Westcott,et al.  De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units , 2015, PeerJ.

[18]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[19]  Dan Knights,et al.  NINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes , 2016, PLoS Comput. Biol..

[20]  Sanne Abeln,et al.  Comparing clustering and pre-processing in taxonomy analysis , 2012, Bioinform..

[21]  Mihai Pop,et al.  Alignment and clustering of phylogenetic markers - implications for microbial diversity studies , 2010, BMC Bioinformatics.

[22]  Rafael P. Mellado,et al.  Estimation of bacterial diversity using next generation sequencing of 16S rDNA: a comparison of different workflows , 2011, BMC Bioinformatics.

[23]  Frédéric Mahé,et al.  Swarm: robust and fast clustering method for amplicon-based studies , 2014, PeerJ.

[24]  Sanne Abeln,et al.  Unraveling the outcome of 16S rDNA-based taxonomy analysis through mock data and simulations , 2014, Bioinform..

[25]  G. Casella,et al.  Pyrosequencing enumerates and contrasts soil microbial diversity , 2007, The ISME Journal.

[26]  T. Rognes,et al.  Swarm v2: highly-scalable and high-resolution amplicon clustering , 2015, PeerJ.

[27]  Patrick D. Schloss,et al.  Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rRNA Gene Sequence Analysis , 2011, Applied and Environmental Microbiology.

[28]  Ben Nichols,et al.  Distributed under Creative Commons Cc-by 4.0 Vsearch: a Versatile Open Source Tool for Metagenomics , 2022 .

[29]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.