Practical Parallel Lempel-Ziv Factorization

In the age of big data, the need for efficient data compression algorithms has grown. A widely used data compression method is the Lempel-Ziv-77 (LZ77) method, being a subroutine in popular compression packages such as gzip and PKZIP. There has been a lot of recent effort on developing practical sequential algorithms for Lempel-Ziv factorization (equivalent to LZ77 compression), but research in practical parallel implementations has been less satisfactory. In this work, we present a simple work-efficient parallel algorithm for Lempel-Ziv factorization. We show theoretically that our algorithm requires linear work and runs in O(log2 n) time (randomized) for constant alphabets and O(nϵ) time (ϵ <; 1) for integer alphabets. We present experimental results showing that our algorithm is efficient and achieves good speedup with respect to the best sequential implementations of Lempel-Ziv factorization.

[1]  Gonzalo Navarro,et al.  Practical Compressed Suffix Trees , 2010, SEA.

[2]  Sergio De Agostino P-complete Problems in Data Compression , 1994, Theor. Comput. Sci..

[3]  Lucian Ilie,et al.  Computing Longest Previous Factor in linear time and applications , 2008, Inf. Process. Lett..

[4]  Guy E. Blelloch,et al.  A Simple Parallel Cartesian Tree Algorithm and its Application to Suffix Tree Construction , 2011, ALENEX.

[5]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[6]  Sergio De Agostino Lempel-Ziv Data Compression on Parallel and Distributed Systems , 2011, Algorithms.

[7]  Moni Naor String Matching with Preprocessing of Text and Pattern , 1991, ICALP.

[8]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[9]  Wojciech Rytter,et al.  Efficient Parallel Algorithms to Test Square-Freeness and Factorize Strings , 1991, Inf. Process. Lett..

[10]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[11]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[12]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[13]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .

[14]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[15]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[16]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[17]  Gang Chen,et al.  Lempel–Ziv Factorization Using Less Time & Space , 2008, Math. Comput. Sci..

[18]  Guy E. Blelloch,et al.  A simple parallel cartesian tree algorithm and its application to parallel suffix tree construction , 2014, ACM Trans. Parallel Comput..

[19]  D. Martin Swany,et al.  CULZSS: LZSS Lossless Data Compression on CUDA , 2011, 2011 IEEE International Conference on Cluster Computing.

[20]  Wojciech Rytter,et al.  LPF Computation Revisited , 2009, IWOCA.

[21]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[22]  Uzi Vishkin,et al.  Optimal Doubly Logarithmic Parallel Algorithms Based on Finding All Nearest Smaller Values , 1993, J. Algorithms.

[23]  William F. Smyth,et al.  A taxonomy of suffix array construction algorithms , 2007, CSUR.

[24]  Lucian Ilie,et al.  A Simple Algorithm for Computing the Lempel Ziv Factorization , 2008, Data Compression Conference (dcc 2008).

[25]  Guy E. Blelloch,et al.  Brief announcement: the problem based benchmark suite , 2012, SPAA '12.

[26]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[27]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[28]  S. Muthukrishnan,et al.  Optimal parallel dictionary matching and compression (extended abstract) , 1995, SPAA '95.

[29]  Simon J. Puglisi,et al.  Lempel-Ziv factorization: Simple, fast, practical , 2013, ALENEX.

[30]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[31]  Shmuel Tomi Klein,et al.  Parallel Lempel Ziv coding , 2001, Discret. Appl. Math..

[32]  Enno Ohlebusch,et al.  Lempel-Ziv Factorization Revisited , 2011, CPM.

[33]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.