An evaluation of the accuracy and speed of metagenome analysis tools

Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html

[1]  Andreas Wilke,et al.  The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools , 2012, BMC Bioinformatics.

[2]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[3]  Alice Carolyn McHardy,et al.  Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods , 2014, Bioinform..

[4]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[5]  Po-E Li,et al.  Accurate read-based metagenome characterization using a hierarchical suite of unique signatures , 2015, Nucleic acids research.

[6]  Jose U. Scher,et al.  The microbiome and rheumatoid arthritis , 2011, Nature Reviews Rheumatology.

[7]  W. Whitman,et al.  The ecological coherence of high bacterial taxonomic ranks , 2010, Nature Reviews Microbiology.

[8]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[9]  Ian Sillitoe,et al.  Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis , 2011, Nucleic Acids Res..

[10]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[11]  Alexandros Stamatakis,et al.  Metagenomic species profiling using universal phylogenetic marker genes , 2013, Nature Methods.

[12]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[13]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[14]  Jos Boekhorst,et al.  Metatranscriptome Analysis of the Human Fecal Microbiota Reveals Subject-Specific Expression Profiles, with Genes Encoding Proteins Involved in Carbohydrate Metabolism Being Dominantly Expressed , 2010, Applied and Environmental Microbiology.

[15]  Jesse R. Zaneveld,et al.  Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences , 2013, Nature Biotechnology.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  J. Shendure,et al.  Materials and Methods Som Text Figs. S1 and S2 Tables S1 to S4 References Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome , 2022 .

[18]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[19]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[20]  Matthew Fraser,et al.  EBI metagenomics—a new resource for the analysis and archiving of metagenomic data , 2013, Nucleic Acids Res..

[21]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[22]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[23]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[24]  Bastian Bentlage,et al.  Metatranscriptome profiling of a harmful algal bloom. , 2014, Harmful algae.

[25]  B. Tümmler,et al.  Genometa - A Fast and Accurate Classifier for Short Metagenomic Shotgun Reads , 2012, PloS one.

[26]  Donald R Schoolmaster,et al.  Mapping the niche space of soil microorganisms using taxonomy and traits. , 2012, Ecology.

[27]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy , 2003, Nucleic Acids Res..

[28]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[29]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[30]  Keith Dobney,et al.  Sequencing ancient calcified dental plaque shows changes in oral microbiota with dietary shifts of the Neolithic and Industrial revolutions , 2013, Nature Genetics.

[31]  J. Eisen,et al.  A simple, fast, and accurate method of phylogenomic inference , 2008, Genome Biology.

[32]  Ann E. Loraine,et al.  The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets , 2009, Bioinform..

[33]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[34]  Qiangde Duan,et al.  Flagella and bacterial pathogenicity , 2013, Journal of basic microbiology.

[35]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[36]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[37]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[38]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[39]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[40]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[41]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[42]  Maya Gokhale,et al.  Scalable metagenomic taxonomy classification using a reference genome database , 2013, Bioinform..

[43]  P. Hemarajata,et al.  The human gut microbiome and body metabolism: implications for obesity and diabetes. , 2013, Clinical chemistry.

[44]  Rob Knight,et al.  Insights from Characterizing Extinct Human Gut Microbiomes , 2012, PloS one.

[45]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[46]  Tatiana A. Tatusova,et al.  RefSeq microbial genomes database: new representation and annotation strategy , 2013, Nucleic Acids Res..

[47]  J. Foster,et al.  Gut–brain axis: how the microbiome influences anxiety and depression , 2013, Trends in Neurosciences.

[48]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[49]  G. B. Golding,et al.  Antibiotic resistance is ancient , 2011, Nature.

[50]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[51]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[52]  Wenjun Jiang,et al.  Inhalable Microorganisms in Beijing’s PM2.5 and PM10 Pollutants during a Severe Smog Event , 2014, Environmental science & technology.

[53]  Paul Horton,et al.  Parameters for accurate genome alignment , 2010, BMC Bioinformatics.

[54]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[55]  S. Lonardi,et al.  CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers , 2015, BMC Genomics.

[56]  Eduardo P. C. Rocha,et al.  Immune Subversion and Quorum-Sensing Shape the Variation in Infectious Dose among Bacterial Pathogens , 2012, PLoS pathogens.

[57]  M. Wilcox,et al.  Hospital disinfectants and spore formation by Clostridium difficile , 2000, The Lancet.

[58]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[59]  James Haile,et al.  Ancient and modern environmental DNA , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[60]  C. Quince,et al.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. , 2013, Environmental microbiology.

[61]  C. Huttenhower,et al.  Relating the metatranscriptome and metagenome of the human gut , 2014, Proceedings of the National Academy of Sciences.

[62]  Mihai Pop,et al.  MetaPhyler: Taxonomic profiling for metagenomic sequences , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[63]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[64]  Scott T. Bates,et al.  Cross-biome metagenomic analyses of soil microbial communities and their functional attributes , 2012, Proceedings of the National Academy of Sciences.

[65]  Jack A. Gilbert,et al.  Metagenome Sequencing of Prokaryotic Microbiota Collected from Byron Glacier, Alaska , 2013, Genome Announcements.

[66]  Noah Fierer,et al.  Seeing the forest for the genes: using metagenomics to infer the aggregated traits of microbial communities , 2014, Front. Microbiol..

[67]  S. Schuster,et al.  Integrative analysis of environmental sequences using MEGAN4. , 2011, Genome research.

[68]  P. Legendre,et al.  vegan : Community Ecology Package. R package version 1.8-5 , 2007 .

[69]  Jae-Hak Lee,et al.  rRNASelector: A computer program for selecting ribosomal RNA encoding sequences from metagenomic and metatranscriptomic shotgun libraries , 2011, The Journal of Microbiology.