Meta-IDBA: a de Novo assembler for metagenomic data

Motivation: Next-generation sequencing techniques allow us to generate reads from a microbial environment in order to analyze the microbial community. However, assembling of a set of mixed reads from different species to form contigs is a bottleneck of metagenomic research. Although there are many assemblers for assembling reads from a single genome, there are no assemblers for assembling reads in metagenomic data without reference genome sequences. Moreover, the performances of these assemblers on metagenomic data are far from satisfactory, because of the existence of common regions in the genomes of subspecies and species, which make the assembly problem much more complicated. Results: We introduce the Meta-IDBA algorithm for assembling reads in metagenomic data, which contain multiple genomes from different species. There are two core steps in Meta-IDBA. It first tries to partition the de Bruijn graph into isolated components of different species based on an important observation. Then, for each component, it captures the slight variants of the genomes of subspecies from the same species by multiple alignments and represents the genome of one species, using a consensus sequence. Comparison of the performances of Meta-IDBA and existing assemblers, such as Velvet and Abyss for different metagenomic datasets shows that Meta-IDBA can reconstruct longer contigs with similar accuracy. Availability: Meta-IDBA toolkit is available at our website http://www.cs.hku.hk/~alse/metaidba. Contact: chin@cs.hku.hk

[1]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[2]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[3]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[4]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[5]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[6]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[7]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences: current status, policy and new initiatives , 2008, Nucleic Acids Res..

[8]  N. Youssef,et al.  Species richness in soil bacterial communities: a proposed approach to overcome sample size bias. , 2008, Journal of microbiological methods.

[9]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[10]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[11]  Armin R. Mikler,et al.  Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, Niagara Falls, NY, USA, August 2-4, 2010 , 2010, BCB.

[12]  K. Lindblad-Toh,et al.  Assisted assembly: how to improve a de novo genome assembly by using related species , 2009, Genome Biology.

[13]  Daniel R. Zerbino,et al.  Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler , 2009, PloS one.

[14]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Mark J. P. Chaisson,et al.  Short read fragment assembly of bacterial genomes. , 2008, Genome research.

[16]  Siu-Ming Yiu,et al.  MetaCluster: unsupervised binning of environmental genomic fragments and taxonomic annotation , 2010, BCB '10.

[17]  Yi Luo,et al.  How independent are the appearances of n-mers in different genomes? , 2004, Bioinform..

[18]  John Bunge,et al.  Predicting microbial species richness. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[19]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[20]  Yu-Wei Wu,et al.  A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples , 2010, RECOMB.

[21]  Naryttza N. Diaz,et al.  Phylogenetic classification of short environmental DNA fragments , 2008, Nucleic acids research.

[22]  Siu-Ming Yiu,et al.  IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler , 2010, RECOMB.

[23]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .