EULER-PCR: Finishing Experiments for Repeat Resolution

Genomic sequencing typically generates a large collection of unordered contigs or scaffolds. Contig ordering (also known as gap closure) is a non-trivial algorithmic and experimental problem since even relatively simple-to-assemble bacterial genomes typically result in large set of contigs. Neighboring contigs maybe separated either by gaps in read coverage or by repeats. In the later case we say that the contigs are separated by pseudogaps, and we emphasize the important difference between gap closure and pseudogap closure. The existing gap closure approaches do not distinguish between gaps and pseudogaps and treat them in the same way. We describe a new fast strategy for closing pseudogaps (repeat resolution). Since in highly repetitive genomes, the number of pseudogaps may exceed the number of gaps by an order of magnitude, this approach provides a significant advantage over the existing gap closure methods.

[1]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[2]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[3]  B. Barrell,et al.  The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences , 2000, Nature.

[4]  B. Barrell,et al.  Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491 , 2000, Nature.

[5]  M Vingron,et al.  Primer design for large scale sequencing. , 1998, Nucleic acids research.

[6]  A. Dunker The pacific symposium on biocomputing , 1998 .

[7]  J. Weissenbach,et al.  A new approach using multiplex long accurate PCR and yeast artificial chromosomes for bacterial chromosome mapping and sequencing. , 1996, Genome research.

[8]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  S. Salzberg,et al.  Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project. , 1999, Genomics.

[10]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[11]  Haixu Tang,et al.  Fragment assembly with double-barreled data , 2001, ISMB.

[12]  Noga Alon,et al.  An optimal procedure for gap closing in whole genome shotgun sequencing , 2001, RECOMB.

[13]  S. Ehrlich,et al.  The complete genome sequence of the lactic acid bacterium Lactococcus lactis ssp. lactis IL1403. , 2001, Genome research.