DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions

BackgroundMany genome projects are left unfinished due to complex, repeated regions. Finishing is the most time consuming step in sequencing and current finishing tools are not designed with particular attention to the repeat problem.ResultsWe have developed DNPTrapper, a shotgun sequence finishing tool, specifically designed to address the problems posed by the presence of repeated regions in the target sequence. The program detects and visualizes single base differences between nearly identical repeat copies, and offers the overview and flexibility needed to rapidly resolve complex regions within a working session. The use of a database allows large amounts of data to be stored and handled, and allows viewing of mammalian size genomes. The program is available under an Open Source license.ConclusionWith DNPTrapper, it is possible to separate repeated regions that previously were considered impossible to resolve, and finishing tasks that previously took days or weeks can be resolved within hours or even minutes.

[1]  C. Desmarais,et al.  Automated finishing with autofinish. , 2001, Genome research.

[2]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[3]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[4]  E. Eichler,et al.  Structure of chromosomal duplicons and their role in mediating human genomic disorders. , 2000, Genome research.

[5]  Björn Andersson,et al.  TRAP: Tandem Repeat Assembly Program produces improved shotgun assemblies of repetitive sequences , 2003, Comput. Methods Programs Biomed..

[6]  Philippe Glaser,et al.  CAAT-Box, contigs-Assembly and Annotation Tool-Box for genome sequencing projects , 2004, Bioinform..

[7]  Eugene W. Myers,et al.  ReAligner: a program for refining DNA sequence multi-alignments , 1997, RECOMB '97.

[8]  Methods : A Companion to Methods in Enzymology , 2022 .

[9]  A. Frasch,et al.  The major cysteine proteinase (cruzipain) from Trypanosoma cruzi is encoded by multiple polymorphic tandemly organized genes located on different chromosomes. , 1992, Molecular and biochemical parasitology.

[10]  R. Arlinghaus,et al.  MECHANISM OF PEPTIDE BOND FORMATION IN POLYPEPTIDE SYNTHESIS. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[11]  C. Caskey,et al.  Closure strategies for random DNA sequencing , 1991 .

[12]  B. Haas,et al.  The Genome Sequence of Trypanosoma cruzi, Etiologic Agent of Chagas Disease , 2005, Science.

[13]  Evan E. Eichler,et al.  An assessment of the sequence gaps: Unfinished business in a finished human genome , 2004, Nature Reviews Genetics.

[14]  K. Isselbacher,et al.  Demonstration of an intestinal monoglyceride lipase: an enzyme with a possible role in the intracellular completion of fat digestion. , 1963, The Journal of clinical investigation.

[15]  C Alonso,et al.  A head-to-tail tandem organization of hsp70 genes in Trypanosoma cruzi. , 1988, Nucleic acids research.

[16]  E. Eichler,et al.  Shotgun sequence assembly and recent segmental duplications within the human genome , 2004, Nature.

[17]  Björn Andersson,et al.  Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs , 2002, Bioinform..

[18]  Kathryn F. Beal,et al.  The Staden package, 1998. , 2000, Methods in molecular biology.

[19]  N. Galanti,et al.  A gene family encoding heterogeneous histone H1 proteins in Trypanosoma cruzi. , 1994, Molecular and biochemical parasitology.