A de novo metagenomic assembly program for shotgun DNA reads

MOTIVATION A high-quality assembly of reads generated from shotgun sequencing is a substantial step in metagenome projects. Although traditional assemblers have been employed in initial analysis of metagenomes, they cannot surmount the challenges created by the features of metagenomic data. RESULT We present a de novo assembly approach and its implementation named MAP (metagenomic assembly program). Based on an improved overlap/layout/consensus (OLC) strategy incorporated with several special algorithms, MAP uses the mate pair information, resulting in being more applicable to shotgun DNA reads (recommended as >200 bp) currently widely used in metagenome projects. Results of extensive tests on simulated data show that MAP can be superior to both Celera and Phrap for typical longer reads by Sanger sequencing, as well as has an evident advantage over Celera, Newbler and the newest Genovo, for typical shorter reads by 454 sequencing. AVAILABILITY AND IMPLEMENTATION The source code of MAP is distributed as open source under the GNU GPL license, the MAP program and all simulated datasets can be freely available at http://bioinfo.ctb.pku.edu.cn/MAP/

[1]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[2]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[3]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[4]  Daniel H. Huson,et al.  48. MetaSim: A Sequencing Simulator for Genomics and Metagenomics , 2011 .

[5]  M. Pop,et al.  Metagenomic Analysis of the Human Distal Gut Microbiome , 2006, Science.

[6]  M. Borodovsky,et al.  Ab initio gene identification in metagenomic sequences , 2010, Nucleic acids research.

[7]  Huaiqiu Zhu,et al.  MetaTISA: Metagenomic Translation Initiation Site Annotator for improving gene start prediction , 2009, Bioinform..

[8]  References , 1971 .

[9]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[10]  Daphne Koller,et al.  Genovo: De Novo Assembly for Metagenomes , 2010, RECOMB.

[11]  J. Mullikin,et al.  The phusion assembler. , 2003, Genome research.

[12]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[13]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[14]  Jaysheel D. Bhavsar,et al.  Metagenomics: Read Length Matters , 2008, Applied and Environmental Microbiology.

[15]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.

[16]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..

[17]  A. Moya,et al.  Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data , 2011, PloS one.

[18]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[19]  J. Doré,et al.  Functional metagenomics to mine the human gut microbiome for dietary fiber catabolic enzymes. , 2010, Genome research.

[20]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[21]  Sergey Koren,et al.  Bambus 2: scaffolding metagenomes , 2011, Bioinform..

[22]  Adam M. Phillippy,et al.  Comparative genome assembly , 2004, Briefings Bioinform..

[23]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[24]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[25]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[26]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[27]  Wayne M. Getz,et al.  Strainer: software for analysis of population variation in community genomic datasets , 2007, BMC Bioinformatics.

[28]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[29]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[30]  Knut Reinert,et al.  A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads , 2009, Bioinform..

[31]  Eugene W. Myers,et al.  The fragment assembly string graph , 2005, ECCB/JBI.

[32]  Mihai Pop,et al.  Minimus: a fast, lightweight genome assembler , 2007, BMC Bioinformatics.

[33]  Sallie W. Chisholm,et al.  Unlocking Short Read Sequencing for Metagenomics , 2010, PloS one.

[34]  Song Li,et al.  LUCY2: an interactive DNA sequence quality trimming and vector removal tool , 2004, Bioinform..

[35]  L. Hillier,et al.  PCAP: a whole-genome assembly program. , 2003, Genome research.

[36]  M. Schatz,et al.  Assembly of large genomes using second-generation sequencing. , 2010, Genome research.

[37]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[38]  Alla Lapidus,et al.  A Bioinformatician's Guide to Metagenomics , 2008, Microbiology and Molecular Biology Reviews.

[39]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[40]  S. Salzberg,et al.  Bioinformatics challenges of new sequencing technology. , 2008, Trends in genetics : TIG.

[41]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[42]  Sergey Koren,et al.  Aggressive assembly of pyrosequencing reads with mates , 2008, Bioinform..

[43]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.