An implementable lossy version of the Lempel-Ziv algorithm - Part I: Optimality for memoryless sources

A new lossy variant of the fixed-database Lempel-Ziv coding algorithm for encoding at a fixed distortion level is proposed, and its asymptotic optimality and universality for memoryless sources (with respect to bounded single-letter distortion measures) is demonstrated: as the database size m increases to infinity, the expected compression ratio approaches the rate-distortion function. The complexity and redundancy characteristics of the algorithm are comparable to those of its lossless counterpart. A heuristic argument suggests that the redundancy is of order (log log m)/log m, and this is also confirmed experimentally; simulation results are presented that agree well with this rate. Also, the complexity of the algorithm is seen to be comparable to that of the corresponding lossless scheme. We show that there is a tradeoff between compression performance and encoding complexity, and we discuss how the relevant parameters can be chosen to balance this tradeoff in practice. We also discuss the performance of the algorithm when applied to sources with memory, and extensions to the cases of unbounded distortion measures and infinite reproduction alphabets.

[1]  Frans M. J. Willems,et al.  Universal data compression and repetition times , 1989, IEEE Trans. Inf. Theory.

[2]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[3]  James A. Bucklew,et al.  The source coding theorem via Sanov's theorem , 1987, IEEE Trans. Inf. Theory.

[4]  Benjamin Weiss,et al.  Entropy and data compression schemes , 1993, IEEE Trans. Inf. Theory.

[5]  Zhen Zhang,et al.  An on-line universal lossy data compression algorithm by continuous codebook refinement , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[6]  Mikhail J. Atallah,et al.  Pattern Matching Image Compression: Algorithmic and Empirical Results , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Monge,et al.  Pattern matching and text compression algorithmsMaxime , 1996 .

[8]  David L. Neuhoff,et al.  Fixed rate universal block source coding with a fidelity criterion , 1975, IEEE Trans. Inf. Theory.

[9]  Wojciech Szpankowski,et al.  Asymptotic properties of data compression and suffix trees , 1993, IEEE Trans. Inf. Theory.

[10]  Aaron D. Wyner,et al.  Fixed data base version of the Lempel-Ziv data compression algorithm , 1991, IEEE Trans. Inf. Theory.

[11]  Kenneth Rose,et al.  Towards Lossy Lempel-Ziv: Natural Type Selection , 1996 .

[12]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Maxime Crochemore,et al.  Pattern-matching and text-compression algorithms , 1996, CSUR.

[15]  Wojciech Szpankowski,et al.  A suboptimal lossy data compression based on approximate pattern matching , 1997, IEEE Trans. Inf. Theory.

[16]  John C. Kieffer,et al.  A survey of the theory of source coding with a fidelity criterion , 1993, IEEE Trans. Inf. Theory.

[17]  Jacob Ziv,et al.  Distortion-rate theory for individual sequences , 1980, IEEE Trans. Inf. Theory.

[18]  J. Muramatsu,et al.  Distortion-complexity and rate-distortion function , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[19]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[20]  R. G. Gallager,et al.  Coding of Sources With Unknown Statistics- Part II: Distortion Relative to a Fidelity Criterion , 1972 .

[21]  Jacob Ziv,et al.  Coding theorems for individual sequences , 1978, IEEE Trans. Inf. Theory.

[22]  Darrel Hankerson,et al.  Introduction to Information Theory and Data Compression , 2003 .

[23]  Yossef Steinberg,et al.  An algorithm for source coding subject to a fidelity criterion, based on string matching , 1993, IEEE Trans. Inf. Theory.

[24]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[25]  Ioannis Kontoyiannis Recurrence and waiting times in stationary processes, and their applications in data compression , 1998 .

[26]  R. Zamir,et al.  A type generator model for adaptive lossy compression , 1997, Proceedings of IEEE International Symposium on Information Theory.

[27]  Zhen Zhang,et al.  An on-line universal lossy data compression algorithm via continuous codebook refinement - Part I: Basic results , 1996, IEEE Trans. Inf. Theory.

[28]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[29]  James A. Bucklew,et al.  A large deviation theory proof of the abstract alphabet source coding theorem , 1988, IEEE Trans. Inf. Theory.

[30]  Wojciech Szpankowski,et al.  Pattern Matching Image Compression with Predication Loop: Preliminary Experimental Results , 1996 .

[31]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[32]  Michelle Effros,et al.  A vector quantization approach to universal noiseless coding and quantization , 1996, IEEE Trans. Inf. Theory.

[33]  A. Dembo,et al.  The asymptotics of waiting times between stationary processes , 1999 .

[34]  Jacob Ziv,et al.  Coding of sources with unknown statistics-II: Distortion relative to a fidelity criterion , 1972, IEEE Trans. Inf. Theory.

[35]  A. D. Wyner,et al.  The sliding-window Lempel-Ziv algorithm is asymptotically optimal , 1994, Proc. IEEE.

[36]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[37]  Aaron D. Wyner,et al.  Improved redundancy of a version of the Lempel-Ziv algorithm , 1995, IEEE Trans. Inf. Theory.

[38]  A. Wyner,et al.  The redundancy and distribution of the phrase lengths of the fixed-database Lempel-Ziv algorithm , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[39]  Ioannis Kontoyiannis,et al.  Asymptotic Recurrence and Waiting Times for Stationary Processes , 1998 .

[40]  I. Kontoyiannis Second-order analysis of lossless and lossy versions of Lempel-Ziv codes , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[41]  Zhen Zhang,et al.  An on-line universal lossy data compression algorithm via continuous codebook refinement - Part II. Optimality for phi-mixing source models , 1996, IEEE Trans. Inf. Theory.

[42]  En-Hui Yang,et al.  On the Performance of Data Compression Algorithms Based Upon String Matching , 1998, IEEE Trans. Inf. Theory.

[43]  Tamás Linder,et al.  Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding , 1994, IEEE Trans. Inf. Theory.

[44]  Toby Berger,et al.  Fixed-slope universal lossy data compression , 1997, IEEE Trans. Inf. Theory.

[45]  D. Ornstein,et al.  Universal Almost Sure Data Compression , 1990 .

[46]  Aaron D. Wyner,et al.  Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression , 1989, IEEE Trans. Inf. Theory.

[47]  En-Hui Yang,et al.  Simple universal lossy data compression schemes derived from the Lempel-Ziv algorithm , 1996, IEEE Trans. Inf. Theory.