MAPseq: highly efficient k-mer search with confidence estimates, for rRNA sequence analysis

Abstract Motivation Ribosomal RNA profiling has become crucial to studying microbial communities, but meaningful taxonomic analysis and inter-comparison of such data are still hampered by technical limitations, between-study design variability and inconsistencies between taxonomies used. Results Here we present MAPseq, a framework for reference-based rRNA sequence analysis that is up to 30% more accurate (F½ score) and up to one hundred times faster than existing solutions, providing in a single run multiple taxonomy classifications and hierarchical operational taxonomic unit mappings, for rRNA sequences in both amplicon and shotgun sequencing strategies, and for datasets of virtually any size. Availability and implementation Source code and binaries are freely available at https://github.com/jfmrod/mapseq Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[2]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[3]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[4]  Sarah L. Westcott,et al.  Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform , 2013, Applied and Environmental Microbiology.

[5]  K. Schleifer,et al.  Update of the All-Species Living Tree Project based on 16S and 23S rRNA sequence analyses. , 2010, Systematic and applied microbiology.

[6]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[7]  Patrick D. Schloss,et al.  The Effects of Alignment Quality, Distance Calculation Method, Sequence Filtering, and Region on the Analysis of 16S rRNA Gene-Based Studies , 2010, PLoS Comput. Biol..

[8]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[9]  Christian von Mering,et al.  HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences , 2013, Bioinform..

[10]  Ben Nichols,et al.  VSEARCH: a versatile open source tool for metagenomics , 2016, PeerJ.

[11]  Dieter M. Tourlousse,et al.  Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing , 2016, Nucleic acids research.

[12]  Sarah L. Westcott,et al.  De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units , 2015, PeerJ.

[13]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[14]  Nicholas A. Bokulich,et al.  mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking , 2016, mSystems.

[15]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[16]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[17]  Dan Knights,et al.  NINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes , 2016, PLoS Comput. Biol..

[18]  Dan Knights,et al.  Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies , 2016, Nature Biotechnology.