Metaxa2 Database Builder: enabling taxonomic identification from metagenomic or metabarcoding data using any genetic marker

Abstract Motivation Correct taxonomic identification of DNA sequences is central to studies of biodiversity using both shotgun metagenomic and metabarcoding approaches. However, no genetic marker gives sufficient performance across all the biological kingdoms, hampering studies of taxonomic diversity in many groups of organisms. This has led to the adoption of a range of genetic markers for DNA metabarcoding. While many taxonomic classification software tools can be re-trained on these genetic markers, they are often designed with assumptions that impair their utility on genes other than the SSU and LSU rRNA. Here, we present an update to Metaxa2 that enables the use of any genetic marker for taxonomic classification of metagenome and amplicon sequence data. Results We evaluated the Metaxa2 Database Builder on 11 commonly used barcoding regions and found that while there are wide differences in performance between different genetic markers, our software performs satisfactorily provided that the input taxonomy and sequence data are of high quality. Availability and implementation Freely available on the web as part of the Metaxa2 package at http://microbiology.se/software/metaxa2/. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Mattia D'Antonio,et al.  MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa , 2011, Nucleic Acids Res..

[2]  Kessy Abarenkov,et al.  Fungal community analysis by high-throughput sequencing of amplified markers – a user's guide , 2013, The New phytologist.

[3]  Douglas W. Yu,et al.  Mitochondrial metagenomics: letting the genes out of the bottle , 2016, GigaScience.

[4]  J. Bengtsson-Palme,et al.  ITS1: a DNA barcode better than ITS2 in eukaryotes? , 2015, Molecular ecology resources.

[5]  Rob Knight,et al.  Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences , 2012, The ISME Journal.

[6]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[7]  Johan Bengtsson-Palme,et al.  metaxa2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data , 2015, Molecular ecology resources.

[8]  M. Coffey,et al.  Development of a multiplex assay for genus- and species-specific detection of Phytophthora based on differences in mitochondrial gene order. , 2014, Phytopathology.

[9]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[10]  P. Hebert,et al.  Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[11]  J. Bengtsson-Palme Strategies for Taxonomic and Functional Annotation of Metagenomes , 2018 .

[12]  Kessy Abarenkov,et al.  V-Xtractor: an open-source, high-throughput software tool to identify and extract hypervariable regions of small subunit (16S/18S) ribosomal RNA gene sequences. , 2010, Journal of microbiological methods.

[13]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[14]  R. Henrik Nilsson,et al.  Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences. , 2012 .

[15]  R. Henrik Nilsson,et al.  Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective , 2006, PloS one.

[16]  M. Hartmann,et al.  Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets , 2011, Antonie van Leeuwenhoek.

[17]  Eric P. Nawrocki,et al.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea , 2011, The ISME Journal.

[18]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[19]  R. Henrik Nilsson,et al.  Metaxa2 Diversity Tools: Easing microbial community analysis with Metaxa2 , 2016, Ecol. Informatics.

[20]  Robert C. Edgar,et al.  SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences , 2016, bioRxiv.

[21]  N. Baeshen,et al.  Biological Identifications Through DNA Barcodes , 2012 .

[22]  D. Hibbett,et al.  Sequence-based classification and identification of Fungi , 2016, Mycologia.

[23]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[24]  T. Bruns,et al.  Comment on “Global assessment of arbuscular mycorrhizal fungus diversity reveals very low endemism” , 2016, Science.

[25]  Mehrdad Hajibabaei,et al.  Automated high throughput animal CO1 metabarcode classification , 2018, Scientific Reports.

[26]  Reed M. Johnson,et al.  Rank-based characterization of pollen assemblages collected by honey bees using a multi-locus metabarcoding approach1 , 2015, Applications in plant sciences.

[27]  J. Bengtsson-Palme,et al.  Evaluating and optimizing the performance of software commonly used for the taxonomic classification of DNA metabarcoding sequence data , 2017, Molecular ecology resources.

[28]  Kenji Matsuura,et al.  Reconstructing the early evolution of Fungi using a six-gene phylogeny , 2006, Nature.

[29]  Amir Feizi,et al.  Strategies to improve usability and preserve accuracy in biological sequence databases , 2016, Proteomics.

[30]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Ben Nichols,et al.  VSEARCH: a versatile open source tool for metagenomics , 2016, PeerJ.

[32]  S. Penny,et al.  cpnDB: a chaperonin sequence database. , 2004, Genome research.

[33]  Pelin Yilmaz,et al.  Phylogeny-aware identification and correction of taxonomically mislabeled sequences , 2016, bioRxiv.

[34]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[35]  P. Taberlet,et al.  Towards next‐generation biodiversity assessment using DNA metabarcoding , 2012, Molecular ecology.

[36]  John L. Spouge,et al.  Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi , 2012, Proceedings of the National Academy of Sciences.

[37]  N. Yoccoz The future of environmental DNA in ecology , 2012, Molecular ecology.

[38]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[39]  S. Kjelleberg,et al.  rpoB-Based Microbial Community Analysis Avoids Limitations Inherent in 16S rRNA Gene Intraspecies Heterogeneity , 2000, Applied and Environmental Microbiology.