MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling

BackgroundMany metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools.ResultsWe propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases.ConclusionsIn a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics.

[1]  Alexander Sczyrba,et al.  Bioboxes: standardised containers for interchangeable bioinformatics software , 2015, GigaScience.

[2]  G. Dougan,et al.  Routine Use of Microbial Whole Genome Sequencing in Diagnostic and Public Health Microbiology , 2012, PLoS pathogens.

[3]  Michael P. Cummings,et al.  A comparative evaluation of sequence classification programs , 2012, BMC Bioinformatics.

[4]  Anders Krogh,et al.  Fast and sensitive taxonomic classification for metagenomics with Kaiju , 2016, Nature Communications.

[5]  Sven Rahmann,et al.  Genome analysis , 2022 .

[6]  Raymond Lo,et al.  Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities , 2015, BMC Bioinformatics.

[7]  Paul P. Gardner,et al.  An evaluation of the accuracy and speed of metagenome analysis tools , 2015, Scientific Reports.

[8]  Sungroh Yoon,et al.  Will solid-state drives accelerate your bioinformatics? In-depth profiling, performance analysis and beyond , 2015, Briefings Bioinform..

[9]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[10]  Duy Tin Truong,et al.  MetaPhlAn2 for enhanced metagenomic taxonomic profiling , 2015, Nature Methods.

[11]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[12]  Dominique Lavenier,et al.  Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software , 2017, bioRxiv.

[13]  Georgios A. Pavlopoulos,et al.  Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies , 2015, Bioinformatics and biology insights.

[14]  Elena Marchiori,et al.  Differences in sequencing technologies improve the retrieval of anammox bacterial genome from metagenomes , 2013, BMC Genomics.

[15]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[16]  Patricia W. Finn,et al.  WEVOTE: Weighted Voting Taxonomic Identification Method of Microbial Sequences , 2016, bioRxiv.

[17]  Adina Howe,et al.  Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial) , 2015, Front. Microbiol..

[18]  Bernhard Y. Renard,et al.  DUDes: a top-down taxonomic profiler for metagenomics , 2016, Bioinform..

[19]  Guy Cochrane,et al.  Toward richer metadata for microbial sequences: replacing strain-level NCBI taxonomy taxids with BioProject, BioSample and Assembly records , 2014, Standards in genomic sciences.

[20]  Sergey I. Nikolenko,et al.  BayesHammer: Bayesian clustering for error correction in single-cell sequencing , 2012, BMC Genomics.

[21]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[22]  Alexandros Stamatakis,et al.  Metagenomic species profiling using universal phylogenetic marker genes , 2013, Nature Methods.

[23]  M. Pallen Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections , 2014, Parasitology.

[24]  S. Lonardi,et al.  CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers , 2015, BMC Genomics.

[25]  Po-E Li,et al.  Accurate read-based metagenome characterization using a hierarchical suite of unique signatures , 2015, Nucleic acids research.

[26]  David A. Rasko,et al.  Bacterial genome sequencing in the clinic: bioinformatic challenges and solutions , 2013, Nature Reviews Genetics.

[27]  Katherine H. Huang,et al.  A framework for human microbiome research , 2012, Nature.

[28]  Hideaki Tanaka,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2011, BCB '11.

[29]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[30]  M. Thomas P. Gilbert,et al.  Environmental genes and genomes: understanding the differences and challenges in the approaches and software for their analyses , 2015, Briefings Bioinform..

[31]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[32]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..