Application of a Database-Independent Approach To Assess the Quality of Operational Taxonomic Unit Picking Methods

Assignment of 16S rRNA gene sequences to operational taxonomic units (OTUs) allows microbial ecologists to overcome the inconsistencies and biases within bacterial taxonomy and provides a strategy for clustering similar sequences that do not have representatives in a reference database. ABSTRACT Assignment of 16S rRNA gene sequences to operational taxonomic units (OTUs) allows microbial ecologists to overcome the inconsistencies and biases within bacterial taxonomy and provides a strategy for clustering similar sequences that do not have representatives in a reference database. I have applied the Matthews correlation coefficient to assess the ability of 15 reference-independent and -dependent clustering algorithms to assign sequences to OTUs. This metric quantifies the ability of an algorithm to reflect the relationships between sequences without the use of a reference and can be applied to any data set or method. The most consistently robust method was the average neighbor algorithm; however, for some data sets, other algorithms matched its performance.

[1]  Hélène Touzet,et al.  SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data , 2012, Bioinform..

[2]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[3]  Dan Knights,et al.  NINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes , 2016, PLoS Comput. Biol..

[4]  Sarah L. Westcott,et al.  De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units , 2015, PeerJ.

[5]  Sanne Abeln,et al.  Comparing clustering and pre-processing in taxonomy analysis , 2012, Bioinform..

[6]  Xiaoyu Wang,et al.  A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis , 2012, Briefings Bioinform..

[7]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[8]  J. Petrosino,et al.  Stabilization of the murine gut microbiome following weaning , 2012, Gut microbes.

[9]  Christian von Mering,et al.  Limits to robustness and reproducibility in the demarcation of operational taxonomic units. , 2015, Environmental microbiology.

[10]  Duccio Cavalieri,et al.  MICCA: a complete and accurate software for taxonomic profiling of metagenomic data , 2015, Scientific Reports.

[11]  Patrick D. Schloss,et al.  Application of database-independent approach to assess the quality of OTU picking methods , 2016, bioRxiv.

[12]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[13]  Mihai Pop,et al.  Alignment and clustering of phylogenetic markers - implications for microbial diversity studies , 2010, BMC Bioinformatics.

[14]  G. Casella,et al.  Pyrosequencing enumerates and contrasts soil microbial diversity , 2007, The ISME Journal.

[15]  William G. Mckendree,et al.  ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences , 2009, Nucleic acids research.

[16]  S. Bartolomé-Jiménez,et al.  European Organization for Nuclear Research , 1954, Nature.

[17]  Yunpeng Cai,et al.  ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time , 2011, Nucleic acids research.

[18]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[19]  T. Rognes,et al.  Swarm v2: highly-scalable and high-resolution amplicon clustering , 2015, PeerJ.

[20]  Patrick D. Schloss,et al.  Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rRNA Gene Sequence Analysis , 2011, Applied and Environmental Microbiology.

[21]  Frédéric Mahé,et al.  Swarm: robust and fast clustering method for amplicon-based studies , 2014, PeerJ.

[22]  Sanne Abeln,et al.  Unraveling the outcome of 16S rDNA-based taxonomy analysis through mock data and simulations , 2014, Bioinform..

[23]  Antonio Gonzalez,et al.  Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences , 2014, PeerJ.

[24]  Robert C. Edgar,et al.  UPARSE: highly accurate OTU sequences from microbial amplicon reads , 2013, Nature Methods.

[25]  Rafael P. Mellado,et al.  Estimation of bacterial diversity using next generation sequencing of 16S rDNA: a comparison of different workflows , 2011, BMC Bioinformatics.

[26]  Rob Knight,et al.  Open-Source Sequence Clustering Methods Improve the State Of the Art , 2016, mSystems.

[27]  Susan M. Huse,et al.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering , 2010, Environmental microbiology.

[28]  Yongmei Cheng,et al.  A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs , 2013, PloS one.