An algorithm for automated closure during assembly

BackgroundFinishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local re-assembly of gap regions. An obvious alternative uses de novo assembly of all the reads.ResultsA procedure called the bounding read algorithm was developed for assembly of shotgun reads plus finishing reads and their constraints, targeting repeat regions. The algorithm was implemented within the Celera Assembler software and its pyrosequencing-specific variant, CABOG. The implementation was tested on Sanger and pyrosequencing data from six genomes. The bounding read assemblies were compared to assemblies from two other methods on the same data. The algorithm generates improved assemblies of repeat regions, closing and tiling some gaps while degrading none.ConclusionsThe algorithm is useful for small-genome automated finishing projects. Our implementation is available as open-source from http://wgs-assembler.sourceforge.net under the GNU Public License.

[1]  C. Desmarais,et al.  Automated finishing with autofinish. , 2001, Genome research.

[2]  A Danchin,et al.  Cloning and assembly strategies in microbial genome projects. , 1999, Microbiology.

[3]  Eugene W. Myers,et al.  Toward Simplifying and Accurately Formulating Fragment Assembly , 1995, J. Comput. Biol..

[4]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[5]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[6]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[7]  Matthew Berriman,et al.  Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology , 2010, Bioinform..

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  Patrick Chain,et al.  Finishing Repetitive Regions Automatically with Dupfinisher , 2006, BIOCOMP.

[10]  Victor Markowitz,et al.  Complete genome sequence of Pedobacter heparinus type strain (HIM 762-3T) , 2009, Standards in genomic sciences.

[11]  BMC Bioinformatics , 2005 .

[12]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[13]  Sergey Koren,et al.  Aggressive assembly of pyrosequencing reads with mates , 2008, Bioinform..

[14]  C. Condon,et al.  Comparison of the expression of the seven ribosomal RNA operons in Escherichia coli. , 1992, The EMBO journal.