Hierarchical scaffolding with Bambus.

The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site.

[1]  Eugene W. Myers,et al.  Combinatorial algorithms for DNA sequence assembly , 1995, Algorithmica.

[2]  E. Kirkness,et al.  The Dog Genome: Survey Sequencing and Comparative Analysis , 2003, Science.

[3]  S. Salzberg,et al.  The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria , 2003, Nature.

[4]  J. Mullikin,et al.  The phusion assembler. , 2003, Genome research.

[5]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.

[6]  Paul Richardson,et al.  The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins , 2002, Science.

[7]  O. White,et al.  Genome sequence of the dissimilatory metal ion–reducing bacterium Shewanella oneidensis , 2002, Nature Biotechnology.

[8]  Ian T. Paulsen,et al.  The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[10]  A. Ruíz,et al.  Chromosomal elements evolve at different rates in the Drosophila genome. , 2002, Genetics.

[11]  Aaron L. Halpern,et al.  Efficiently detecting polymorphisms during the fragment assembly process , 2002, ISMB.

[12]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[13]  Mihai Pop,et al.  Comparative Genome Sequencing for Discovery of Novel Polymorphisms in Bacillus anthracis , 2002, Science.

[14]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[15]  D. Haussler,et al.  Assembly of the working draft of the human genome with GigAssembler. , 2001, Genome research.

[16]  Haixu Tang,et al.  Fragment assembly with double-barreled data , 2001, ISMB.

[17]  Eugene W. Myers,et al.  The greedy path-merging algorithm for sequence assembly , 2001, RECOMB.

[18]  Noga Alon,et al.  An optimal procedure for gap closing in whole genome shotgun sequencing , 2001, RECOMB.

[19]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000, Softw. Pract. Exp..

[20]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[21]  S. Salzberg,et al.  Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project. , 1999, Genomics.

[22]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[23]  R. Karp,et al.  Error checking and graphical representation of multiple-complete-digest (MCD) restriction-fragment maps. , 1999, Genome research.

[24]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[25]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[26]  J. Roach,et al.  Pairwise end sequencing: a unified approach to genomic mapping and sequencing. , 1995, Genomics.

[27]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[28]  F. Sanger,et al.  Nucleotide sequence of bacteriophage lambda DNA. , 1982, Journal of molecular biology.

[29]  R. Staden,et al.  Nucleotide sequence of bacteriophage G4 DNA , 1978, Nature.