Redundancy of the Lempel-Ziv incremental parsing rule

The Lempel-Ziv codes are universal variable-to-fixed length codes that have become virtually standard in practical lossless data compression. For any given source output string from a Markov or unifilar source, we upper-bound the difference between the number of binary digits needed to encode the string and the self-information of the string. We use this result to demonstrate that for unifilar or Markov sources, the redundancy of encoding the first n letters of the source output with the Lempel-Ziv incremental parsing rule (LZ'78), the Welch modification (LZW), or a new variant is O((ln n)/sup -1/), and we upper-bound the exact form of convergence. We conclude by considering the relationship between the code length and the empirical entropy associated with a string.

[1]  Guy Louchard,et al.  On the average redundancy rate of the Lempel-Ziv code , 1997, IEEE Trans. Inf. Theory.

[2]  Guy Louchard,et al.  Average redundancy rate of the Lempel-Ziv code , 1996, Proceedings of Data Compression Conference - DCC '96.

[3]  B. Gnedenko,et al.  Limit Distributions for Sums of Independent Random Variables , 1955 .

[4]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[5]  Jorma Rissanen,et al.  Complexity of strings in the class of Markov sources , 1986, IEEE Trans. Inf. Theory.

[6]  Serap A. Savari,et al.  Arithmetic coding for finite-state noiseless channels , 1994, IEEE Trans. Inf. Theory.

[7]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[8]  Serap A. Savari,et al.  Variable-to-fixed length codes for sources with known and unknown memory , 1996 .

[9]  Marcelo J. Weinberger,et al.  Upper bounds on the probability of sequences emitted by finite-state sources and on the redundancy of the Lempel-Ziv algorithm , 1992, IEEE Trans. Inf. Theory.

[10]  Mark N. Wegman,et al.  Variations on a theme by Ziv and Lempel , 1985 .

[11]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[12]  Marcelo Weinberger,et al.  Upper Bounds On The Probability Of Sequences Emitted By Finite-state Sources And On The Redundancy Of The Lempel-Ziv Algorithm , 1991, Proceedings. 1991 IEEE International Symposium on Information Theory.

[13]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[14]  Robert G. Gallager,et al.  Variations on a theme by Huffman , 1978, IEEE Trans. Inf. Theory.

[15]  Robert G. Gallager,et al.  Discrete Stochastic Processes , 1995 .

[16]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[17]  Brian Parker Tunstall,et al.  Synthesis of noiseless compression codes , 1967 .

[18]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[19]  Donald E. Knuth The art of computer programming: fundamental algorithms , 1969 .

[20]  R. Gallager Information Theory and Reliable Communication , 1968 .

[21]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[22]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[23]  Philippe Jacquet,et al.  Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees , 1995, Theor. Comput. Sci..