MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample

Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable. Results: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time. Availability: http://i.cs.hku.hk/~alse/MetaCluster/ Contact: chin@cs.hku.hk

[1]  Yan Boucher,et al.  Use of 16S rRNA and rpoB Genes as Molecular Markers for Microbial Ecology Studies , 2006, Applied and Environmental Microbiology.

[2]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[3]  Natalia Ivanova,et al.  Metagenomic analysis of phosphorus removing sludge communities , 2008 .

[4]  Tao Jiang,et al.  Separating metagenomic short reads into genomes via clustering , 2012, Algorithms for Molecular Biology.

[5]  Zhaojun Bai,et al.  CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads , 2007, RECOMB.

[6]  Yi Luo,et al.  How independent are the appearances of n-mers in different genomes? , 2004, Bioinform..

[7]  Rustam I. Aminov,et al.  Predominant Role of Host Genetics in Controlling the Composition of Gut Microbiota , 2008, PloS one.

[8]  Jonathan A Eisen,et al.  Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes , 2007, PLoS biology.

[9]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis , 2004, Nucleic Acids Res..

[10]  R. Amann,et al.  Combination of 16S rRNA-targeted oligonucleotide probes with flow cytometry for analyzing mixed microbial populations , 1990, Applied and environmental microbiology.

[11]  Raj Acharya,et al.  A two-way multi-dimensional mixture model for clustering metagenomic sequences , 2011, BCB '11.

[12]  Frank Oliver Glöckner,et al.  TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences , 2004, BMC Bioinformatics.

[13]  Siu-Ming Yiu,et al.  Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers , 2009, BMC Bioinformatics.

[14]  Siu-Ming Yiu,et al.  MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation , 2010, BCB '10.

[15]  Siu-Ming Yiu,et al.  MetaCluster 4.0: A Novel Binning Algorithm for NGS Reads and Huge Number of Species , 2012, J. Comput. Biol..

[16]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[17]  Yu-Wei Wu,et al.  A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples , 2010, RECOMB.

[18]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[19]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.