Converting SLP to LZ78 in almost Linear Time

Given a straight line program of size n, we are interested in constructing the LZ78 factorization of the corresponding text. We show how to perform such conversion in \(\mathcal{O}(n+m\log m)\) time, where m is the number of LZ78 codewords. This improves on the previously known \(\mathcal{O}(n\sqrt{N}+m\log N)\) solution [Bannai et al., SPIRE 2012]. The main tool in our algorithm is a data structure which allows us to efficiently operate on labels of the paths in a growing trie, and a certain method of recompressing the parse whenever it leads to decreasing its size.

[1]  Wojciech Rytter,et al.  Almost-optimal fully LZW-compressed pattern matching , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[2]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.

[3]  Gad M. Landau,et al.  A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices , 2003, SIAM J. Comput..

[4]  Craig G. Nevill-Manning,et al.  Compression by induction of hierarchical grammars , 1994, Proceedings of IEEE Data Compression Conference (DCC'94).

[5]  Stephen Alstrup,et al.  Improved Algorithms for Finding Level Ancestors in Dynamic Trees , 2000, ICALP.

[6]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[7]  Pawel Gawrychowski Faster Algorithm for Computing the Edit Distance between SLP-Compressed Strings , 2012, SPIRE.

[8]  Hideo Bannai,et al.  Efficient LZ78 Factorization of Grammar Compressed Text , 2012, SPIRE.

[9]  Pawel Gawrychowski Tying up the loose ends in fully LZW-compressed pattern matching , 2012, STACS.

[10]  Alistair Moffat,et al.  Off-line dictionary-based compression , 2000 .

[11]  Wojciech Rytter Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..

[12]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[13]  Hideo Bannai,et al.  Fast q-gram mining on SLP compressed strings , 2011, J. Discrete Algorithms.

[14]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[15]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[16]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[17]  Artur Jez Faster Fully Compressed Pattern Matching by Recompression , 2012, ICALP.

[18]  Ming Li,et al.  Genre Classification via an LZ78-Based String Kernel , 2005, ISMIR.

[19]  Gad M. Landau,et al.  A Unified Algorithm for Accelerating Edit-Distance Computation via Text-Compression , 2009, STACS.

[20]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[21]  Yury Lifshits,et al.  Processing Compressed Texts: A Tractability Border , 2007, CPM.

[22]  Hideo Bannai,et al.  Faster Subsequence and Don't-Care Pattern Matching on Compressed Texts , 2011, CPM.

[23]  Ming Li,et al.  Image Classification Via LZ78 Based String Kernel: A Comparative Study , 2006, PAKDD.

[24]  Ayumi Shinohara,et al.  Speeding Up String Pattern Matching by Text Compression: The Dawn of a New Era , 2001 .

[25]  Ming Li,et al.  An LZ78 Based String Kernel , 2005, ADMA.

[26]  Richard Cole,et al.  Dynamic LCA queries on trees , 1999, SODA '99.