Restructuring Compressed Texts without Explicit Decompression

We consider the problem of {\em restructuring} compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string $T$ produced by any grammar-based compression algorithm, to representations produced by several specific compression algorithms including LZ77, LZ78, run length encoding, and some grammar based compression algorithms. These are the first algorithms that achieve running times polynomial in the size of the compressed input and output representations of $T$. Since most of the representations we consider can achieve exponential compression, our algorithms are theoretically faster in the worst case, than any algorithm which first decompresses the string for the conversion.

[1]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[2]  Hiroshi Sakamoto,et al.  ESP-index: A compressed index based on edit-sensitive parsing , 2011, J. Discrete Algorithms.

[3]  Alexander Tiskin Towards Approximate Matching in Compressed Strings: Local Subsequence Recognition , 2011, CSR.

[4]  being Knuth-Morris-Pratt Optimal pattern matching in LZW compressed strings , 2010 .

[5]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[6]  Gonzalo Navarro,et al.  Self-indexing Based on LZ77 , 2011, CPM.

[7]  Hideo Bannai,et al.  Fast q-gram mining on SLP compressed strings , 2011, J. Discrete Algorithms.

[8]  Rodrigo González,et al.  Rank/select on dynamic compressed sequences and applications , 2009, Theor. Comput. Sci..

[9]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2002, SODA '02.

[10]  Pamela C. Cosman,et al.  Universal lossless compression via multilevel pattern matching , 2000, IEEE Trans. Inf. Theory.

[11]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[12]  Gad M. Landau,et al.  A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression , 2009, STACS.

[13]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[14]  Yury Lifshits Solving Classical String Problems an Compressed Texts , 2006, Combinatorial and Algorithmic Foundations of Pattern and Association Discovery.

[15]  En-Hui Yang,et al.  Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.

[16]  Ayumi Shinohara,et al.  Simple Linear-Time Off-Line Text Compression by Longest-First Substitution , 2007, 2007 Data Compression Conference (DCC'07).

[17]  Yury Lifshits,et al.  Processing Compressed Texts: A Tractability Border , 2007, CPM.

[18]  Wing-Kai Hon,et al.  Compressed indexes for dynamic text collections , 2007, TALG.

[19]  Graham Cormode,et al.  Substring compression problems , 2005, SODA '05.

[20]  Gonzalo Navarro,et al.  Dynamic entropy-compressed sequences and full-text indexes , 2006, TALG.

[21]  A. Apostolico,et al.  Off-line compression by greedy textual substitution , 2000, Proceedings of the IEEE.

[22]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.

[23]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[24]  Craig G. Nevill-Manning,et al.  Compression by induction of hierarchical grammars , 1994, Proceedings of IEEE Data Compression Conference (DCC'94).

[25]  Wojciech Plandowski,et al.  Efficient algorithms for Lempel-Ziv encoding , 1996 .

[26]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[27]  Wing-Kai Hon,et al.  Dynamic dictionary matching and compressed suffix trees , 2005, SODA '05.

[28]  Wojciech Rytter,et al.  Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2002, Theor. Comput. Sci..

[29]  Gad M. Landau,et al.  Random access to grammar-compressed strings , 2010, SODA '11.

[30]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[31]  Gonzalo Navarro,et al.  Self-Indexed Grammar-Based Compression , 2011, Fundam. Informaticae.

[32]  Abhi Shelat,et al.  The smallest grammar problem , 2005, IEEE Transactions on Information Theory.

[33]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[34]  Gonzalo Navarro,et al.  Dynamic Fully-Compressed Suffix Trees , 2008, CPM.

[35]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[36]  Hideo Bannai,et al.  Faster Subsequence and Don't-Care Pattern Matching on Compressed Texts , 2011, CPM.

[37]  Kunsoo Park,et al.  Dynamic rank/select structures with applications to run-length encoded texts , 2009, Theor. Comput. Sci..

[38]  Pawel Gawrychowski,et al.  Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic , 2011, ESA.

[39]  Ayumi Shinohara,et al.  Efficient algorithms to compute compressed longest common substrings and compressed palindromes , 2009, Theor. Comput. Sci..