An Algorithm to Solve the Motif Alignment Problem for Approximate Nested Tandem Repeats

An approximate nested tandem repeat (NTR) in a string T is a complex repetitive structure consisting of many approximate copies of two substrings x and X ("motifs") interspersed with one another. NTRs have been found in real DNA sequences and are expected to have applications for evolutionary studies, both as a tool to understand concerted evolution, and as a potential marker in population studies. In this paper we describe software tools developed for database searches for NTRs. After a first program NTRF inder identifies putative NTR motifs, a confirmation step requires the application of the alignment of the putative NTR against exact NTRs built from the putative template motifs x andX. In this paper we describe an algorithm to solve this alignment problem in O(|T|(|x| + |X|)) space and time. Our alignment algorithm is based on Fischetti et al.'s wraparound dynamic programming.

[1]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[2]  Vincent A. Fischetti,et al.  Identifying Periodic Occurrences of a Template with Applications to Protein Structure , 1993, Inf. Process. Lett..

[3]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[4]  M. Hendy,et al.  NTRFINDER: AN ALGORITHM TO FIND NESTED TANDEM REPEATS , 2010 .

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  Deborah Joseph,et al.  Beyond tandem repeats: complex pattern structures and distant regions of similarity , 2002, ISMB.

[7]  Dan Geiger,et al.  Finding approximate tandem repeats in genomic sequences. , 2005, Journal of computational biology : a journal of computational molecular cell biology.

[8]  Aaron M. Newman,et al.  XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences , 2007, BMC Bioinformatics.

[9]  Gad M. Landau,et al.  Identifying Periodic Occurrences of a Template with Applications to Protein Structures , 1992, CPM.

[10]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[11]  Franco P. Preparata,et al.  A Novel Approach to the Detection of Genomic Approximate Tandem Repeats in the Levenshtein Metric , 2007, J. Comput. Biol..

[12]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.