Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes

BackgroundMicrobiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools.MethodsWe tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification.ResultsWe concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases.ConclusionsIn conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.

[1]  Natalia N. Ivanova,et al.  Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea , 2017, Nature Biotechnology.

[2]  David J. Studholme,et al.  Considerations for the development and application of control materials to improve metagenomic microbial community profiling , 2013, Accreditation and Quality Assurance.

[3]  Florent E. Angly,et al.  Microbial Ecology of Four Coral Atolls in the Northern Line Islands , 2008, PloS one.

[4]  J. Lederberg,et al.  `Ome Sweet `Omics--A Genealogical Treasury of Words , 2001 .

[5]  E. Dinsdale,et al.  Distinct biogeographical patterns of marine bacterial taxonomy and functional genes , 2017 .

[6]  Holly M. Bik,et al.  PhyloSift: phylogenetic analysis of genomes and metagenomes , 2014, PeerJ.

[7]  Katherine H. Huang,et al.  Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning , 2015, Nature Biotechnology.

[8]  Craig E. Nelson,et al.  Local genomic adaptation of coral reef-associated microbiomes to gradients of natural variability and anthropogenic stressors , 2014, Proceedings of the National Academy of Sciences.

[9]  James R. Cole,et al.  Reconstructing 16S rRNA genes in metagenomic data , 2015, Bioinform..

[10]  Xiaowei Xu,et al.  A structural approach for finding functional modules from large biological networks , 2008, BMC Bioinformatics.

[11]  Peter Salamon,et al.  Reference-independent comparative metagenomics using cross-assembly: crAss , 2012, Bioinform..

[12]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[13]  Jörg Peplies,et al.  JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison , 2015, Bioinform..

[14]  Lu Wang,et al.  The NIH Human Microbiome Project. , 2009, Genome research.

[15]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[16]  Kathleen Marchal,et al.  Evaluation of time profile reconstruction from complex two-color microarray designs , 2008, BMC Bioinformatics.

[17]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[18]  Arend Hintze,et al.  Scaling metagenome sequence assembly with probabilistic de Bruijn graphs , 2011, Proceedings of the National Academy of Sciences.

[19]  Bas E. Dutilh,et al.  FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares , 2014, PeerJ.

[20]  E. Dinsdale,et al.  The skin microbiome of the common thresher shark (Alopias vulpinus) has low taxonomic and gene function &bgr;‐diversity , 2017, Environmental microbiology reports.

[21]  T. Thomas,et al.  Community Structure and Functional Gene Profile of Bacteria on Healthy and Diseased Thalli of the Red Seaweed Delisea pulchra , 2012, PloS one.

[22]  Philip D. Blood,et al.  Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software , 2017, Nature Methods.

[23]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[24]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[25]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[26]  J. Lennon,et al.  Scaling laws predict global microbial diversity , 2016, Proceedings of the National Academy of Sciences.

[27]  P. Vandamme,et al.  DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. , 2007, International journal of systematic and evolutionary microbiology.

[28]  Rodolfo Paranhos,et al.  Abrolhos Bank Reef Health Evaluated by Means of Water Quality, Microbial Diversity, Benthic Cover, and Fish Biomass Data , 2012, PloS one.

[29]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[30]  E. Dinsdale,et al.  Aura-biomes are present in the water layer above coral reef benthic macro-organisms , 2017, PeerJ.

[31]  E. Dinsdale,et al.  Nearshore Pelagic Microbial Community Abundance Affects Recruitment Success of Giant Kelp, Macrocystis pyrifera , 2016, Front. Microbiol..

[32]  R. García-López,et al.  Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations , 2015, Front. Bioeng. Biotechnol..

[33]  Robert A. Edwards,et al.  Multivariate Analysis of Functional Metagenomes , 2013, Front. Genet..

[34]  Donovan Parks,et al.  GroopM: an automated tool for the recovery of population genomes from related metagenomes , 2014, PeerJ.

[35]  Dominique Lavenier,et al.  Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software , 2017, bioRxiv.

[36]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[37]  S Karlin,et al.  Compositional biases of bacterial genomes and evolutionary implications , 1997, Journal of bacteriology.

[38]  Jens Roat Kultima,et al.  Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes , 2014, Nature Biotechnology.

[39]  John Vollmers,et al.  Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters! , 2017, PloS one.

[40]  Florent E. Angly,et al.  Oxygen minimum zones harbour novel viral communities with low diversity. , 2012, Environmental microbiology.

[41]  P. Hugenholtz,et al.  Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes , 2013, Nature Biotechnology.

[42]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[43]  Frank Oliver Glöckner,et al.  TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences , 2004, BMC Bioinformatics.

[44]  Ruben E. Valas,et al.  Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage , 2011, The ISME Journal.

[45]  Vineet K. Sharma,et al.  Reconstruction of Bacterial and Viral Genomes from Multiple Metagenomes , 2016, Front. Microbiol..

[46]  Jiajie Zhang,et al.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[47]  M. Blaser,et al.  Evolutionary implications of microbial genome tetranucleotide frequency biases. , 2003, Genome research.

[48]  M. Hovland,et al.  High diversity of microplankton surrounds deep-water coral reef in the Norwegian Sea. , 2012, FEMS microbiology ecology.

[49]  F. Thompson,et al.  Niche distribution and influence of environmental parameters in marine microbial communities: a systematic review , 2015, PeerJ.

[50]  J. Gilbert,et al.  Recovering complete and draft population genomes from metagenome datasets , 2016, Microbiome.

[51]  P. Hugenholtz Exploring prokaryotic diversity in the genomic era , 2002, Genome Biology.

[52]  Tulika Prakash,et al.  Functional assignment of metagenomic data: challenges and applications , 2012, Briefings Bioinform..

[53]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.

[54]  Rick L. Stevens,et al.  Functional metagenomic profiling of nine biomes , 2008, Nature.

[55]  Rida Assaf,et al.  Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center , 2016, Nucleic Acids Res..