An evaluation of the accuracy and speed of metagenome analysis tools

Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html

[1]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy , 2003, Nucleic Acids Res..

[2]  C. Huttenhower,et al.  Relating the metatranscriptome and metagenome of the human gut , 2014, Proceedings of the National Academy of Sciences.

[3]  Jae-Hak Lee,et al.  rRNASelector: A computer program for selecting ribosomal RNA encoding sequences from metagenomic and metatranscriptomic shotgun libraries , 2011, The Journal of Microbiology.

[4]  Alexandros Stamatakis,et al.  Metagenomic species profiling using universal phylogenetic marker genes , 2013, Nature Methods.

[5]  Noah Fierer,et al.  Seeing the forest for the genes: using metagenomics to infer the aggregated traits of microbial communities , 2014, Front. Microbiol..

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[8]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  P. Legendre,et al.  vegan : Community Ecology Package. R package version 1.8-5 , 2007 .

[11]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[12]  Matthew Fraser,et al.  EBI metagenomics—a new resource for the analysis and archiving of metagenomic data , 2013, Nucleic Acids Res..

[13]  C. Quince,et al.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. , 2013, Environmental microbiology.

[14]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[15]  Jose U. Scher,et al.  The microbiome and rheumatoid arthritis , 2011, Nature Reviews Rheumatology.

[16]  P. Hemarajata,et al.  The human gut microbiome and body metabolism: implications for obesity and diabetes. , 2013, Clinical chemistry.

[17]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[18]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[19]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[20]  Mihai Pop,et al.  MetaPhyler: Taxonomic profiling for metagenomic sequences , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[21]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[22]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[23]  Ian Sillitoe,et al.  Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis , 2011, Nucleic Acids Res..

[24]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[25]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[26]  Keith Dobney,et al.  Sequencing ancient calcified dental plaque shows changes in oral microbiota with dietary shifts of the Neolithic and Industrial revolutions , 2013, Nature Genetics.

[27]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[28]  J. Eisen,et al.  A simple, fast, and accurate method of phylogenomic inference , 2008, Genome Biology.

[29]  Rob Knight,et al.  Insights from Characterizing Extinct Human Gut Microbiomes , 2012, PloS one.

[30]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[31]  S. Schuster,et al.  Integrative analysis of environmental sequences using MEGAN4. , 2011, Genome research.

[32]  Hendrik Poinar,et al.  Antibiotic resistance is ancient: implications for drug discovery. , 2012, Trends in microbiology.

[33]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[34]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[35]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[36]  C. T. Farley,et al.  Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome , 2008 .

[37]  Ann E. Loraine,et al.  The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets , 2009, Bioinform..

[38]  Qiangde Duan,et al.  Flagella and bacterial pathogenicity , 2013, Journal of basic microbiology.

[39]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[40]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[41]  M. Wilcox,et al.  Hospital disinfectants and spore formation by Clostridium difficile , 2000, The Lancet.

[42]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[43]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[44]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[45]  Bastian Bentlage,et al.  Metatranscriptome profiling of a harmful algal bloom. , 2014, Harmful Algae.

[46]  James Haile,et al.  Ancient and modern environmental DNA , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[47]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[48]  Maya Gokhale,et al.  Scalable metagenomic taxonomy classification using a reference genome database , 2013, Bioinform..

[49]  J. Foster,et al.  Gut–brain axis: how the microbiome influences anxiety and depression , 2013, Trends in Neurosciences.

[50]  S. Lonardi,et al.  CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers , 2015, BMC Genomics.

[51]  Eduardo P. C. Rocha,et al.  Immune Subversion and Quorum-Sensing Shape the Variation in Infectious Dose among Bacterial Pathogens , 2012, PLoS pathogens.

[52]  Wenjun Jiang,et al.  Inhalable Microorganisms in Beijing’s PM2.5 and PM10 Pollutants during a Severe Smog Event , 2014, Environmental science & technology.

[53]  T. Tatusova,et al.  RefSeq microbial genomes database: new representation and annotation strategy , 2015, Nucleic Acids Research.

[54]  Paul Horton,et al.  Parameters for accurate genome alignment , 2010, BMC Bioinformatics.

[55]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[56]  Alice Carolyn McHardy,et al.  Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods , 2014, Bioinform..

[57]  Jos Boekhorst,et al.  Metatranscriptome Analysis of the Human Fecal Microbiota Reveals Subject-Specific Expression Profiles, with Genes Encoding Proteins Involved in Carbohydrate Metabolism Being Dominantly Expressed , 2010, Applied and Environmental Microbiology.

[58]  Jesse R. Zaneveld,et al.  Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences , 2013, Nature Biotechnology.

[59]  B. Tümmler,et al.  Genometa - A Fast and Accurate Classifier for Short Metagenomic Shotgun Reads , 2012, PloS one.

[60]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[61]  Scott T. Bates,et al.  Cross-biome metagenomic analyses of soil microbial communities and their functional attributes , 2012, Proceedings of the National Academy of Sciences.

[62]  Jack A. Gilbert,et al.  Metagenome Sequencing of Prokaryotic Microbiota Collected from Byron Glacier, Alaska , 2013, Genome Announcements.

[63]  Andreas Wilke,et al.  The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools , 2012, BMC Bioinformatics.

[64]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[65]  Po-E Li,et al.  Accurate read-based metagenome characterization using a hierarchical suite of unique signatures , 2015, Nucleic acids research.

[66]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.