A note on the Ziv-Lempel model for compressing individual sequences

The Ziv-Lempel compression algorithm is a string matching and parsing approach to data compression. The symbolwise equivalent for parsing models has been defined by Rissanen and Langdon and gives the same ideal codelength at the same cost in coding parameters. By describing the context and coding parameter for each symbol an insight is provided into how the Ziv-Lempel method achieves compression. This treatment does not employ a probabilistic source for the data string. The Ziv-Lempel method effectively counts symbol instances within parsed phrases. The coding parameter for each symbolwise context is determined by cumulative count ratios. The code string length increase for a symbol y following substring s , under the symbolwise equivalent, is the log of the ratio of node counts in subtrees s and s\cdot y of the Ziv-Lempel parsing tree. To demonstrate the symbolwise equivalent of the Ziv-Lempel algorithm, we extend the work of Rissanen and Langdon to incomplete parse trees. The result requires the proper handling of the comma when one phrase is the prefix of another phrase.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Jorma Rissanen,et al.  Generalized Kraft Inequality and Arithmetic Coding , 1976, IBM J. Res. Dev..

[3]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[4]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[5]  Jorma Rissanen,et al.  Compression of Black-White Images with Arithmetic Coding , 1981, IEEE Trans. Commun..