Metagenomics workflow for hybrid assembly, differential coverage binning, metatranscriptomics and pathway analysis (MUFFIN)

Metagenomics has redefined many areas of microbiology. However, metagenome-assembled genomes (MAGs) are often fragmented, primarily when sequencing was performed with short reads. Recent long-read sequencing technologies promise to improve genome reconstruction. However, the integration of two different sequencing modalities makes downstream analyses complex. We, therefore, developed MUFFIN, a complete metagenomic workflow that uses short and long reads to produce high-quality bins and their annotations. The workflow is written by using Nextflow, a workflow orchestration software, to achieve high reproducibility and fast and straightforward use. This workflow also produces the taxonomic classification and KEGG pathways of the bins and can be further used by providing RNA-Seq data (optionally) for quantification and annotation. We tested the workflow using twenty biogas reactor samples and assessed the capacity of MUFFIN to process and output relevant files needed to analyze the microbial community and their function. MUFFIN produces functional pathway predictions and if provided de novo transcript annotations across the metagenomic sample and for each bin. Author Summary RVD did the development and design of MUFFIN and wrote the first draft; BM and EBR did the critical reading and correction of the manuscript; MH did the critical reading of the manuscript and the general adjustments for the metagenomic workflow; AV did the critical reading of the manuscript and adjustments for the taxonomic classifications. CB supervised the project, did the workflow design, helped with the implementation, and revised the manuscript.

[1]  Tom O. Delmont,et al.  Anvi’o: an advanced analysis and visualization platform for ‘omics data , 2015, PeerJ.

[2]  Anders F. Andersson,et al.  Binning metagenomic contigs by coverage and composition , 2014, Nature Methods.

[3]  E. Bongcam-Rudloff,et al.  Abundance Tracking by Long-Read Nanopore Sequencing of Complex Microbial Communities in Samples from 20 Different Biogas/Wastewater Plants , 2020 .

[4]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[5]  Dmitry Antipov,et al.  hybridSPAdes: an algorithm for hybrid assembly of short and long reads , 2016, Bioinform..

[6]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[7]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[8]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[9]  James Taylor,et al.  MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis , 2018, Microbiome.

[10]  T. Dreher,et al.  Towards long-read metagenomics: complete assembly of three novel genomes from bacteria dependent on a diazotrophic cyanobacterium in a freshwater lake co-culture , 2017, Standards in genomic sciences.

[11]  Eleazar Eskin,et al.  Improving the usability and archival stability of bioinformatics software , 2019, Genome Biology.

[12]  P. Hugenholtz,et al.  Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes , 2013, Nature Biotechnology.

[13]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[14]  M. Reddy,et al.  Metatranscriptomics: an approach for retrieving novel eukaryotic genes from polluted and related environments , 2020, 3 Biotech.

[15]  E. Delong,et al.  Assembly-free single-molecule sequencing recovers complete virus genomes from natural microbial communities , 2020, Genome research.

[16]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[17]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.

[18]  Davide Heller,et al.  eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses , 2018, Nucleic Acids Res..

[19]  Bernard Henrissat,et al.  Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome , 2012, PLoS Comput. Biol..

[20]  E. Delong,et al.  Assembly-free single-molecule nanopore sequencing recovers complete virus genomes from natural microbial communities , 2019, bioRxiv.

[21]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[22]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[23]  P. Pevzner,et al.  metaFlye: scalable long-read metagenome assembly using repeat graphs , 2019, Nature Methods.

[24]  Luiz Irber,et al.  sourmash: a library for MinHash sketching of DNA , 2016, J. Open Source Softw..

[25]  M. Hattori,et al.  Long-read metagenomic exploration of extrachromosomal mobile genetic elements in the human gut , 2019, Microbiome.

[26]  J. DiRuggiero,et al.  MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis , 2018, Microbiome.

[27]  S. Campanaro,et al.  The anaerobic digestion microbiome: a collection of 1600 metagenome-assembled genomes shows high species diversity related to methane production , 2019, bioRxiv.

[28]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[29]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[30]  Donovan H. Parks,et al.  A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life , 2018, Nature Biotechnology.

[31]  C. Ahrens,et al.  Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system , 2018, BMC Microbiology.

[32]  S. Tringe,et al.  MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm , 2014, Microbiome.

[33]  M. Marz,et al.  Inclusion of Oxford Nanopore long reads improves all microbial and phage metagenome-assembled genomes from a complex aquifer system , 2019, bioRxiv.

[34]  Luis Pedro Coelho,et al.  Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper , 2016, bioRxiv.

[35]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[36]  Stephen C. Watts,et al.  Correcting index databases improves metagenomic studies , 2019, bioRxiv.

[37]  Annerys Carabeo-Pérez,et al.  Metagenomic approaches: effective tools for monitoring the structure and functionality of microbiomes in anaerobic digestion systems , 2019, Applied Microbiology and Biotechnology.

[38]  Michelle L. Treiber,et al.  SAMSA2: a standalone metatranscriptome analysis pipeline , 2017, BMC Bioinformatics.

[39]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[40]  Rituparna De Metagenomics: aid to combat antimicrobial resistance in diarrhea , 2019, Gut Pathogens.

[41]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[42]  Matthew B. Sullivan,et al.  Long-read viral metagenomics captures abundant and microdiverse viral populations and their niche-defining genomic islands , 2019 .

[43]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[44]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[45]  K. McMahon,et al.  Linking metagenomics to aquatic microbial ecology and biogeochemical cycles , 2019, Limnology and Oceanography.