Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes

Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or ‘bin’ sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new microbial organisms and aids in the microbial genome reconstruction process. Here we present MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, we demonstrate the visualization of metagenomes in MyCC to aid in the reconstruction of genomes from distinct bins. MyCC is freely available at http://sourceforge.net/projects/sb2nhri/files/MyCC/.

[1]  Edward C. Uberbacher,et al.  Gene and translation initiation site prediction in metagenomic sequences , 2012, Bioinform..

[2]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[3]  Anders F. Andersson,et al.  Binning metagenomic contigs by coverage and composition , 2014, Nature Methods.

[4]  P. Bork,et al.  Accurate and universal delineation of prokaryotic species , 2013, Nature Methods.

[5]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[6]  Alexandros Stamatakis,et al.  Metagenomic species profiling using universal phylogenetic marker genes , 2013, Nature Methods.

[7]  S. Tringe,et al.  Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen , 2011, Science.

[8]  F. Raymond,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Ray Meta: scalable de novo metagenome assembly and profiling , 2012 .

[9]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[10]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[11]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.

[12]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[13]  Hideaki Tanaka,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2011, BCB '11.

[14]  Paul Wilmes,et al.  Alignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction , 2014, Scientific Reports.

[15]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[16]  Brian C. Thomas,et al.  Community-wide analysis of microbial genome sequence signatures , 2009, Genome Biology.

[17]  Blake A. Simmons,et al.  MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets , 2016, Bioinform..

[18]  Donovan Parks,et al.  GroopM: an automated tool for the recovery of population genomes from related metagenomes , 2014, PeerJ.

[19]  M. David,et al.  Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw , 2011, Nature.

[20]  Bahlul Haider,et al.  Omega: an Overlap-graph de novo Assembler for Metagenomics , 2014, Bioinform..

[21]  M. Pellegrini,et al.  Distinct Shifts in Microbiota Composition during Drosophila Aging Impair Intestinal Function and Drive Mortality. , 2015, Cell reports.

[22]  Piotr Gawron,et al.  VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data , 2015, Microbiome.

[23]  S. Tringe,et al.  MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm , 2014, Microbiome.

[24]  Alison S. Waller,et al.  Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data , 2012, PloS one.

[25]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[26]  Alexander Sczyrba,et al.  Nonlinear Dimensionality Reduction for Cluster Identification in Metagenomic Samples , 2013, 2013 17th International Conference on Information Visualisation.

[27]  Jens Roat Kultima,et al.  Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes , 2014, Nature Biotechnology.

[28]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[29]  Peer Bork,et al.  MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit , 2012, PloS one.

[30]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[31]  Brian C. Thomas,et al.  Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization , 2013, Genome research.

[32]  C. Quince,et al.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. , 2013, Environmental microbiology.

[33]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..