Computing Lempel-Ziv Factorization Online

We present an algorithm which computes the Lempel-Ziv factorization of a word W of length n online in the following sense: it reads W starting from the left, and, after reading each r = O(logn) characters of W , updates the Lempel-Ziv factorization. The algorithm requires O(n) bits of space and O(n log n) time. The basis of the algorithm is a sparse suffix tree combined with wavelet trees.

[1]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[2]  Wing-Kai Hon,et al.  On Entropy-Compressed Text Indexing in External Memory , 2009, SPIRE.

[3]  Lucian Ilie,et al.  A Simple Algorithm for Computing the Lempel Ziv Factorization , 2008, Data Compression Conference (dcc 2008).

[4]  Maxime Crochemore,et al.  Transducers and Repetitions , 1986, Theor. Comput. Sci..

[5]  Enno Ohlebusch,et al.  Lempel-Ziv Factorization Revisited , 2011, CPM.

[6]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[7]  Jens Stoye,et al.  Linear time algorithms for finding and representing all the tandem repeats in a string , 2004, J. Comput. Syst. Sci..

[8]  Lucian Ilie,et al.  Computing Longest Previous Factor in linear time and applications , 2008, Inf. Process. Lett..

[9]  Kunihiko Sadakane,et al.  An Online Algorithm for Finding the Longest Previous Factors , 2008, ESA.

[10]  Juha Kärkkäinen,et al.  Sparse Suffix Trees , 1996, COCOON.

[11]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[12]  Wing-Kai Hon,et al.  I/O-Efficient Compressed Text Indexes: From Theory to Practice , 2010, 2010 Data Compression Conference.

[13]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[14]  Wing-Kai Hon,et al.  Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing , 2008, Data Compression Conference (dcc 2008).

[15]  Gonzalo Navarro,et al.  Self-indexing Based on LZ77 , 2011, CPM.

[16]  Wojciech Rytter,et al.  LPF Computation Revisited , 2009, IWOCA.

[17]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[18]  Gonzalo Navarro,et al.  Dynamic entropy-compressed sequences and full-text indexes , 2006, TALG.

[19]  Gang Chen,et al.  Lempel–Ziv Factorization Using Less Time & Space , 2008, Math. Comput. Sci..