metaxa2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data

The ribosomal rRNA genes are widely used as genetic markers for taxonomic identification of microbes. Particularly the small subunit (SSU; 16S/18S) rRNA gene is frequently used for species‐ or genus‐level identification, but also the large subunit (LSU; 23S/28S) rRNA gene is employed in taxonomic assignment. The metaxa software tool is a popular utility for extracting partial rRNA sequences from large sequencing data sets and assigning them to an archaeal, bacterial, nuclear eukaryote, mitochondrial or chloroplast origin. This study describes a comprehensive update to metaxa – metaxa2 – that extends the capabilities of the tool, introducing support for the LSU rRNA gene, a greatly improved classifier allowing classification down to genus or species level, as well as enhanced support for short‐read (100 bp) and paired‐end sequences, among other changes. The performance of metaxa2 was compared to other commonly used taxonomic classifiers, showing that metaxa2 often outperforms previous methods in terms of making correct predictions while maintaining a low misclassification rate. metaxa2 is freely available from http://microbiology.se/software/metaxa2/.

[1]  John L. Spouge,et al.  Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi , 2012, Proceedings of the National Academy of Sciences.

[2]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[3]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[4]  M. Hartmann,et al.  Megraft: a software package to graft ribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes and similar environmental datasets. , 2012, Research in microbiology.

[5]  Torsten Thomas,et al.  Reconstruction of Ribosomal RNA Genes from Metagenomic Data , 2012, PloS one.

[6]  Mattia D'Antonio,et al.  MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa , 2011, Nucleic Acids Res..

[7]  M. Kaspari,et al.  Nutrient enrichment increased species richness of leaf litter fungal assemblages in a tropical forest , 2013, Molecular ecology.

[8]  Eric P. Nawrocki,et al.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea , 2011, The ISME Journal.

[9]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[10]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[11]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[12]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[13]  N. Yoccoz The future of environmental DNA in ecology , 2012, Molecular ecology.

[14]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[15]  Suparna Mitra,et al.  Introduction to the analysis of environmental sequences: metagenomics with MEGAN. , 2012, Methods in molecular biology.

[16]  Rob Knight,et al.  Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences , 2012, The ISME Journal.

[17]  Kessy Abarenkov,et al.  V-Xtractor: an open-source, high-throughput software tool to identify and extract hypervariable regions of small subunit (16S/18S) ribosomal RNA gene sequences. , 2010, Journal of microbiological methods.

[18]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[19]  P. Taberlet,et al.  Towards next‐generation biodiversity assessment using DNA metabarcoding , 2012, Molecular ecology.

[20]  D. Tautz,et al.  An evaluation of LSU rDNA D1-D2 sequences for their use in species identification , 2007, Frontiers in Zoology.

[21]  L. Raskin,et al.  PCR Biases Distort Bacterial and Archaeal Community Structure in Pyrosequencing Datasets , 2012, PloS one.

[22]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[23]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[24]  Michael W. Hall,et al.  Bacterial Communities Associated with Culex Mosquito Larvae and Two Emergent Aquatic Plants of Bioremediation Importance , 2013, PloS one.

[25]  Jonathan P. Bollback,et al.  Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. , 2006, Genome research.

[26]  Emily R. Davenport,et al.  Taxonomic Classification of Bacterial 16S rRNA Genes Using Short Sequencing Reads: Evaluation of Effective Study Designs , 2013, PloS one.

[27]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[28]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[29]  Rohit Ghai,et al.  Metagenomes of Mediterranean Coastal Lagoons , 2012, Scientific Reports.

[30]  H. Grossart,et al.  Importance of Saprotrophic Freshwater Fungi for Pollen Degradation , 2014, PloS one.

[31]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[32]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[33]  R. Henrik Nilsson,et al.  Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data , 2013 .

[34]  M. Hartmann,et al.  Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets , 2011, Antonie van Leeuwenhoek.

[35]  Sharon L. Grim,et al.  Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data , 2013, Methods in ecology and evolution.

[36]  Jack A. Gilbert,et al.  Human and Environmental Impacts on River Sediment Microbial Communities , 2014, PloS one.