Compression of Low Entropy Strings with Lempel-Ziv Algorithms

We compare the compression ratio of the Lempel--Ziv algorithms with the empirical entropy of the input string. This approach makes it possible to analyze the performance of these algorithms without any assumption on the input and to obtain worst case results. We show that in this setting the standard definition of optimal compression algorithm is not satisfactory. In fact, although Lempel--Ziv algorithms are optimal according to the standard definition, there exist families of low entropy strings which are not compressed optimally. More precisely, the compression ratio achieved by LZ78 (resp., LZ77) can be much higher than the zeroth order entropy H0 (resp., the first order entropy H1). For this reason we introduce the concept of $\lambda$-optimal algorithm. An algorithm is $\lambda$-optimal with respect to Hk if, loosely speaking, its compression ratio is asymptotically bounded by $\lambda$ times the kth order empirical entropy Hk. We prove that LZ78 cannot be $\lambda$-optimal with respect to any Hk with $k\geq 0$. Then, we describe a new algorithm which combines LZ78 with run length encoding (RLE) and is 3-optimal with respect to H0. Finally, we prove that LZ77 is 8-optimal with respect to H0, and that it cannot be $\lambda$-optimal with respect to Hk for any $k\geq 1$.

[1]  D. Sheinwald,et al.  On the Ziv-Lempel proof and related topics , 1994, Proc. IEEE.

[2]  Charles Bloom,et al.  LZP: a new data compression algorithm , 1996, Proceedings of Data Compression Conference - DCC '96.

[3]  Abraham J. Wyner The redundancy and distribution of the phrase lengths of the fixed-database Lempel-Ziv algorithm , 1997, IEEE Trans. Inf. Theory.

[4]  Guy Louchard,et al.  On the average redundancy rate of the Lempel-Ziv code , 1997, IEEE Trans. Inf. Theory.

[5]  Serap A. Savari,et al.  Redundancy of the Lempel-Ziv incremental parsing rule , 1997, IEEE Trans. Inf. Theory.

[6]  Serap A. Savari,et al.  Redundancy of the Lempel-Ziv String Matching Code , 1998, IEEE Trans. Inf. Theory.

[7]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[8]  A. D. Wyner,et al.  The sliding-window Lempel-Ziv algorithm is asymptotically optimal , 1994, Proc. IEEE.

[9]  En-Hui Yang,et al.  On the Performance of Data Compression Algorithms Based Upon String Matching , 1998, IEEE Trans. Inf. Theory.

[10]  Wojciech Szpankowski,et al.  Asymptotic properties of data compression and suffix trees , 1993, IEEE Trans. Inf. Theory.

[11]  Aaron D. Wyner,et al.  Fixed data base version of the Lempel-Ziv data compression algorithm , 1991, IEEE Trans. Inf. Theory.

[12]  Kingo Kobayashi,et al.  On asymptotic optimality of a sliding window variation of Lempel-Ziv codes , 1993, IEEE Trans. Inf. Theory.

[13]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[14]  Dominique Perrin,et al.  Compression and Entropy , 1992, STACS.

[15]  Marcelo J. Weinberger,et al.  Upper bounds on the probability of sequences emitted by finite-state sources and on the redundancy of the Lempel-Ziv algorithm , 1992, IEEE Trans. Inf. Theory.

[16]  Aaron D. Wyner,et al.  Improved redundancy of a version of the Lempel-Ziv algorithm , 1995, IEEE Trans. Inf. Theory.

[17]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[18]  Philippe Jacquet,et al.  Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees , 1995, Theor. Comput. Sci..

[19]  Jack K. Wolf,et al.  New asymptotic bounds and improvements on the Lempel-Ziv data compression algorithm , 1991, IEEE Trans. Inf. Theory.

[20]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[21]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.