Space Efficient Linear Time Lempel-Ziv Factorization for Small Alphabets

We present a new linear time algorithm for computing the Lempel-Ziv Factorization (LZ77) of a given string of length N on an alphabet of size σ, that utilizes only N log N + O(σ log N) bits of working space. When the alphabet size is small, this greatly improves the previous best space requirement for linear time LZ77 factorization (Karkkainen et al. CPM 2013), which is 2N log N bits, i.e. two integer arrays of length N. Experiments show that despite the added complexity of the algorithm, the speed of the algorithm is only around two to three times slower than previous fastest linear time algorithms.

[1]  Simon J. Puglisi,et al.  Lempel-Ziv factorization: Simple, fast, practical , 2013, ALENEX.

[2]  Hideo Bannai,et al.  Simpler and Faster Lempel Ziv Factorization , 2013, 2013 Data Compression Conference.

[3]  Juha Kärkkäinen,et al.  Linear Time Lempel-Ziv Factorization: Simple, Fast, Small , 2012, CPM.

[4]  Max Chochemore Linear searching for a square in a word , 1984, Bull. EATCS.

[5]  Sen Zhang,et al.  Two Efficient Algorithms for Linear Time Suffix Array Construction , 2011, IEEE Transactions on Computers.

[6]  Arnaud Lefebvre,et al.  Linear-Time Computation of Local Periods , 2003, MFCS.

[7]  Lucian Ilie,et al.  Computing Longest Previous Factor in linear time and applications , 2008, Inf. Process. Lett..

[8]  Juha Kärkkäinen,et al.  Permuted Longest-Common-Prefix Array , 2009, CPM.

[9]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[10]  Wojciech Rytter,et al.  Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2002, Theor. Comput. Sci..

[11]  Lucian Ilie,et al.  A comparison of index-based lempel-Ziv LZ77 factorization algorithms , 2012, CSUR.

[12]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[13]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[14]  Juha Kärkkäinen,et al.  Lightweight Lempel-Ziv Parsing , 2013, SEA.

[15]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[16]  Juha Kärkkäinen,et al.  A Faster Grammar-Based Self-index , 2011, LATA.

[17]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[18]  Ge Nong,et al.  Practical linear-time O(1)-workspace suffix sorting for constant alphabets , 2013, TOIS.