Finding All Approximate Gapped Palindromes

We study the problem of finding all maximal approximate gapped palindromes in a string. More specifically, given a string S of length n, a parameter q ? 0 and a threshold k > 0, the problem is to identify all substrings in S of the form uvw such that (1) the Levenshtein distance between u and w r is at most k, where w r is the reverse of w and (2) v is a string of length q. The best previous work requires O(k 2 n) time. In this paper, we propose an O(kn)-time algorithm for this problem by utilizing an incremental string comparison technique. It turns out that the core technique actually solves a more general incremental string comparison problem that allows the insertion, deletion, and substitution of multiple symbols.

[1]  Kun-Mao Chao,et al.  Sequence Comparison - Theory and Methods , 2008, Computational Biology.

[2]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[3]  Uzi Vishkin,et al.  On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[4]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[5]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[6]  Gad M. Landau,et al.  Incremental String Comparison , 1998, SIAM J. Comput..

[7]  Gary Benson,et al.  Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. , 2004, Genome research.

[8]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[9]  L. Allison Finding Approximate Palindromes in Strings Quickly and Simply , 2004, ArXiv.

[10]  Ayumi Shinohara,et al.  Efficient algorithms to compute compressed longest common substrings and compressed palindromes , 2009, Theor. Comput. Sci..

[11]  Valmir Carneiro Barbosa,et al.  Finding approximate palindromes in strings , 2002, Pattern Recognit..

[12]  Gregory Kucherov,et al.  Searching for Gapped Palindromes , 2008, CPM.

[13]  Sung-Ryul Kim,et al.  A Dynamic Edit Distance Table , 2000, CPM.

[14]  Amy Glen Occurrences of palindromes in characteristic Sturmian words , 2006, Theor. Comput. Sci..

[15]  Ming-Ying Leung,et al.  Scoring schemes of palindrome clusters for more sensitive prediction of replication origins in herpesviruses , 2005, Nucleic acids research.

[16]  James D. Currie Palindrome positions in ternary square-free words , 2008, Theor. Comput. Sci..