On the average redundancy rate of the Lempel-Ziv code

In this paper, we settle a long-standing open problem concerning the average redundancy r/sub n/ of the Lempel-Ziv'78 (LZ78) code. We prove that for a memoryless source the average redundancy rate attains asymptotically Er/sub n/=(A+/spl delta/(n))/log n+ O(log log n/log/sup 2/ n), where A is an explicitly given constant that depends on source characteristics, and /spl delta/(x) is a fluctuating function with a small amplitude. We also derive the leading term for the kth moment of the number of phrases. We conclude by conjecturing a precise formula on the expected redundancy for a Markovian source. The main result of this paper is a consequence of the second-order properties of the Lempel-Ziv algorithm obtained by Jacquet and Szpankowski (1995). These findings have been established by analytical techniques of the precise analysis of algorithms. We give a brief survey of these results since they are interesting in their own right, and shed some light on the probabilistic behavior of pattern matching based data compression.

[1]  P. Billingsley,et al.  Convergence of Probability Measures , 1970, The Mathematical Gazette.

[2]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[3]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[5]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[6]  Mireille Régnier,et al.  Normal Limiting Distribution of the Size of Tries , 1987, Performance.

[7]  D. Aldous,et al.  A diffusion limit for a class of randomly-growing binary trees , 1988 .

[8]  Wojciech Szpankowski,et al.  Some Results on V-ary Asymmetric Tries , 1988, J. Algorithms.

[9]  Wojciech Szpankowski A Characterization of Digital Search Trees from the Successful Search Viewpoint , 1991, Theor. Comput. Sci..

[10]  Philippe Jacquet,et al.  Analysis of digital tries with Markovian dependency , 1991, IEEE Trans. Inf. Theory.

[11]  Edgar N. Gilbert,et al.  The Lempel-Ziv algorithm and message complexity , 1992, IEEE Trans. Inf. Theory.

[12]  Philippe Flajolet,et al.  Generalized Digital Trees and Their Difference-Differential Equations , 1992, Random Struct. Algorithms.

[13]  Marcelo J. Weinberger,et al.  Upper bounds on the probability of sequences emitted by finite-state sources and on the redundancy of the Lempel-Ziv algorithm , 1992, IEEE Trans. Inf. Theory.

[14]  Hosam M. Mahmoud,et al.  Evolution of random search trees , 1991, Wiley-Interscience series in discrete mathematics and optimization.

[15]  Wojciech Szpankowski,et al.  A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors , 1993, SIAM J. Comput..

[16]  Paul C. Shields,et al.  Universal redundancy rates do not exist , 1993, IEEE Trans. Inf. Theory.

[17]  Philippe Jacquet,et al.  Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees , 1995, Theor. Comput. Sci..

[18]  Guy Louchard,et al.  Generalized Lempel-Ziv parsing scheme and its preliminary analysis of the average profile , 1995, Proceedings DCC '95 Data Compression Conference.

[19]  Guy Louchard,et al.  Average profile and limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm , 1995, IEEE Trans. Inf. Theory.

[20]  Aaron D. Wyner,et al.  Improved redundancy of a version of the Lempel-Ziv algorithm , 1995, IEEE Trans. Inf. Theory.

[21]  Serap A. Savari,et al.  Redundancy of the Lempel-Ziv incremental parsing rule , 1997, IEEE Trans. Inf. Theory.

[22]  Guy Louchard,et al.  Average Profile of the Generalized Digital Search Tree and the Generalized Lempel-Ziv Algorithm , 1999, SIAM J. Comput..