Minimus: a fast, lightweight genome assembler

BackgroundGenome assemblers have grown very large and complex in response to the need for algorithms to handle the challenges of large whole-genome sequencing projects. Many of the most common uses of assemblers, however, are best served by a simpler type of assembler that requires fewer software components, uses less memory, and is far easier to install and run.ResultsWe have developed the Minimus assembler to address these issues, and tested it on a range of assembly problems. We show that Minimus performs well on several small assembly tasks, including the assembly of viral genomes, individual genes, and BAC clones. In addition, we evaluate Minimus' performance in assembling bacterial genomes in order to assess its suitability as a component of a larger assembly pipeline. We show that, unlike other software currently used for these tasks, Minimus produces significantly fewer assembly errors, at the cost of generating a more fragmented assembly.ConclusionWe find that for small genomes and other small assembly tasks, Minimus is faster and far more flexible than existing tools. Due to its small size and modular design Minimus is perfectly suited to be a component of complex assembly pipelines. Minimus is released as an open-source software project and the code is available as part of the AMOS project at Sourceforge.

[1]  Eugene W. Myers,et al.  The fragment assembly string graph , 2005, ECCB/JBI.

[2]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[3]  Eugene W. Myers,et al.  Toward Simplifying and Accurately Formulating Fragment Assembly , 1995, J. Comput. Biol..

[4]  D. Schlessinger,et al.  Mutations in GPC3, a glypican gene, cause the Simpson-Golabi-Behmel overgrowth syndrome , 1996, Nature Genetics.

[5]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[6]  Hans Söderlund,et al.  SEQAID: a DNA sequence assembling program based on a mathematical model , 1984, Nucleic Acids Res..

[7]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[8]  D. Schlessinger,et al.  Glypican-3 Expression in Wilms Tumor and Hepatoblastoma , 2001, Journal of pediatric hematology/oncology.

[9]  J. Filmus,et al.  Glypican-3 expression is silenced in human breast cancer , 2001, Oncogene.

[10]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[11]  Nick Campbell,et al.  Maize genetics and genomics database , 2003, Nature Reviews Genetics.

[12]  S. Salzberg,et al.  Hierarchical scaffolding with Bambus. , 2003, Genome research.

[13]  B. Berger,et al.  Sequencing a genome by walking with clone-end sequences: a mathematical analysis. , 1999 .

[14]  G. Weinstock,et al.  The Atlas genome assembly system. , 2004, Genome research.

[15]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[16]  James Ostell,et al.  The Genome Assembly Archive: A New Public Resource , 2004, PLoS biology.

[17]  S. Salzberg,et al.  Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution , 2005, Nature.

[18]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.

[19]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[20]  G. Jayson,et al.  Heparan sulfate proteoglycans and cancer , 2001, British Journal of Cancer.

[21]  E. Koonin,et al.  Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. , 2000, Science.

[22]  Hui-Hsien Chou,et al.  DNA sequence quality trimming and vector removal , 2001, Bioinform..

[23]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[24]  A. Clark,et al.  Y Chromosome of D. pseudoobscura Is Not Homologous to the Ancestral Drosophila Y , 2005, Science.

[25]  J. Mullikin,et al.  The phusion assembler. , 2003, Genome research.

[26]  Michael Roberts,et al.  Reducing storage requirements for biological sequence comparison , 2004, Bioinform..

[27]  Mihai Pop,et al.  Genome Sequence Assembly: Algorithms and Issues , 2002, Computer.

[28]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.