mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes

Abstract   Profiling the taxonomic composition of microbial communities commonly involves the classification of ribosomal RNA gene fragments. As a trade-off to maintain high classification accuracy, existing tools are typically limited to the genus level. Here, we present mTAGs, a taxonomic profiling tool that implements the alignment of metagenomic sequencing reads to degenerate consensus reference sequences of small subunit ribosomal RNA genes. It uses DNA fragments, that is, paired-end sequencing reads, as count units and provides relative abundance profiles at multiple taxonomic ranks, including operational taxonomic units based on a 97% sequence identity cutoff. At the genus rank, mTAGs outperformed other tools across several metrics, such as the F1 score by >11% across data from different environments, and achieved competitive (F1 score) or better results (Bray–Curtis dissimilarity) at the sub-genus level. Availability and implementation The software tool mTAGs is implemented in Python. The source code and binaries are freely available (https://github.com/SushiLab/mTAGs). The data underlying this article are available in Zenodo, at https://doi.org/10.5281/zenodo.4352762. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Christian von Mering,et al.  MAPseq: highly efficient k-mer search with confidence estimates, for rRNA sequence analysis , 2017, Bioinform..

[2]  J. Bunge,et al.  Polymerase chain reaction primers miss half of rRNA microbial diversity , 2009, The ISME Journal.

[3]  Aleksandra Tarkowska,et al.  Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments , 2018, GigaScience.

[4]  Haixu Tang,et al.  Comparing Bacterial Communities Inferred from 16s Rrna Gene Sequencing and Shotgun Metagenomics , 2011, Pacific Symposium on Biocomputing.

[5]  Francesco Asnicar,et al.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.

[6]  Gerhard G. Thallinger,et al.  Wx Scout Fashion Sneaker Splash Navy Women's Keds qAS4tR1wn4 for bawln.com , 2009 .

[7]  Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. , 2021, Nature protocols.

[8]  Paul J. McMurdie,et al.  DADA2: High resolution sample inference from Illumina amplicon data , 2016, Nature Methods.

[9]  Robert C. Edgar,et al.  UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing , 2016, bioRxiv.

[10]  Chao Xie,et al.  RiboTagger: fast and unbiased 16S/18S profiling using whole community shotgun metagenomic or metatranscriptome surveys , 2016, BMC Bioinformatics.

[11]  Johan Bengtsson-Palme,et al.  metaxa2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data , 2015, Molecular ecology resources.

[12]  J. Tiedje,et al.  Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes , 2015, Applied and Environmental Microbiology.

[13]  Francisco M. Cornejo-Castillo,et al.  Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities. , 2014, Environmental microbiology.

[14]  Vanja Klepac-Ceraj,et al.  PCR-Induced Sequence Artifacts and Bias: Insights from Comparison of Two 16S rRNA Clone Libraries Constructed from the Same Sample , 2005, Applied and Environmental Microbiology.

[15]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[16]  Marcus J. Claesson,et al.  Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions , 2010, Nucleic acids research.

[17]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..