Efficient retrieval of approximate palindromes in a run-length encoded string

In this paper, we study the palindrome retrieval problem with the input string compressed into run-length encoded form. Given a run-length encoded string rle(T), we show how to preprocess rle(T) to support subsequent queries of the longest palindrome centered at any specified position and having any specified number of mismatches between its arms. We present two algorithms for the problem, both taking time and space polynomial in the compressed string size. Let n denote the number of runs of rle(T) and let k denote the number of mismatches. The first algorithm, devised for small k, identifies the desired palindrome in O(logn+min{k,n}) time with O(nlogn) preprocessing time, while the second algorithm achieves O(log^2n) query time, independent of k, after O(n^2logn)-time preprocessing.

[1]  Gregory Kucherov,et al.  Searching for gapped palindromes , 2008, Theor. Comput. Sci..

[2]  Valmir Carneiro Barbosa,et al.  Finding approximate palindromes in strings , 2002, Pattern Recognit..

[3]  Wojciech Plandowski,et al.  Efficient Algorithms for Lempel-Zip Encoding (Extended Abstract) , 1996, SWAT.

[4]  Y. L. Wang,et al.  A fast algorithm for finding the positions of all squares in a run-length encoded string , 2009, Theor. Comput. Sci..

[5]  Mikkel Thorup,et al.  String Matching in Lempel—Ziv Compressed Strings , 1998, Algorithmica.

[6]  Kuan-Yu Chen,et al.  A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings , 2010, ESA.

[7]  Ayumi Shinohara,et al.  Efficient algorithms to compute compressed longest common substrings and compressed palindromes , 2009, Theor. Comput. Sci..

[8]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[9]  George S. Lueker,et al.  A data structure for orthogonal range queries , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[10]  Kun-Mao Chao,et al.  Sequence Comparison - Theory and Methods , 2008, Computational Biology.

[11]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[12]  Hisashi Tanaka,et al.  Palindromic gene amplification — an evolutionarily conserved role for DNA inverted repeats in the genome , 2009, Nature Reviews Cancer.

[13]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[14]  Dan E. Willard Predicate-Oriented Database Search Algorithms , 1978, Outstanding Dissertations in the Computer Sciences.

[15]  Gary Benson,et al.  Efficient two-dimensional compressed matching , 1992, Data Compression Conference, 1992..

[16]  S. Tapscott,et al.  Intrastrand Annealing Leads to the Formation of a Large DNA Palindrome and Determines the Boundaries of Genomic Amplification in Human Cancer , 2007, Molecular and Cellular Biology.

[17]  S. Muthukrishnan,et al.  On the sorting-complexity of suffix tree construction , 2000, JACM.

[18]  Zvi Galil Palindrome Recognition in Real Time by a Multitape Turing Machine , 1978, J. Comput. Syst. Sci..

[19]  Steve Rozen,et al.  Abundant gene conversion between arms of palindromes in human and ape Y chromosomes , 2003, Nature.

[20]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[21]  Kuan-Yu Chen,et al.  Hardness of comparing two run-length encoded strings , 2010, J. Complex..

[22]  Wojciech Rytter,et al.  Almost-optimal fully LZW-compressed pattern matching , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[23]  Kuan-Yu Chen,et al.  Finding All Approximate Gapped Palindromes , 2010, Int. J. Found. Comput. Sci..

[24]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[25]  Jan L. A. van de Snepscheut,et al.  On the design of some systolic algorithms , 1989, JACM.

[26]  S. Tapscott,et al.  Widespread and nonrandom distribution of DNA palindromes in cancer cells provides a structural platform for subsequent gene amplification , 2005, Nature Genetics.

[27]  Zvi Galil,et al.  Finding all periods and initial palindromes of a string in parallel , 1992, Algorithmica.