Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees

The Lempel-Ziv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phase. The parameter of interest is the number Mn of phrases that one can construct from a sequence of length n. In this paper, for the memoryless source with unequal probabilities of symbols generation we derive the limiting distribution of Mn which turns out to be normal. This proves a long-standing open problem. In fact, to obtain this result we solved another open problem, namely, that of establishing the limiting distribution of the internal path length in a digital search tree. The latter is a consequence of an asymptotic solution of a multiplicative differential-functional equation often arising in the analysis of algorithms on words. Interestingly enough, our findings are proved by a combination of probabilistic techniques such as renewal equation and uniform integrability, and analytical techniques such as Mellin transform, differential-functional equations, de-Poissonization, and so forth. In concluding remarks we indicate a possibility of extending our results to Markovian models.

[1]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[2]  Wojciech Szpankowski A Characterization of Digital Search Trees from the Successful Search Viewpoint , 1991, Theor. Comput. Sci..

[3]  Wojciech Szpankowski,et al.  A Generalized Suffix Tree and its (Un)expected Asymptotic Behaviors , 1993, SIAM J. Comput..

[4]  Philippe Jacquet,et al.  A functional equation often arising in the analysis of algorithms (extended abstract) , 1994, STOC '94.

[5]  Helmut Prodinger,et al.  Digital Search Trees Again Revisited: The Internal Path Length Perspective , 1994, SIAM J. Comput..

[6]  R. Remmert,et al.  Theory of Complex Functions , 1990 .

[7]  Philippe Jacquet,et al.  Limiting Distribution for the Depth in Patricia Tries , 1993, SIAM J. Discret. Math..

[8]  Edgar N. Gilbert,et al.  The Lempel-Ziv algorithm and message complexity , 1992, IEEE Trans. Inf. Theory.

[9]  Marcelo Weinberger,et al.  Upper Bounds On The Probability Of Sequences Emitted By Finite-state Sources And On The Redundancy Of The Lempel-Ziv Algorithm , 1991, Proceedings. 1991 IEEE International Symposium on Information Theory.

[10]  Wojciech Szpankowski,et al.  Asymptotic properties of data compression and suffix trees , 1993, IEEE Trans. Inf. Theory.

[11]  Mireille Régnier,et al.  New results on the size of tries , 1989, IEEE Trans. Inf. Theory.

[12]  Guy Louchard,et al.  Average profile and limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm , 1995, IEEE Trans. Inf. Theory.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[15]  Jacob Ziv,et al.  Coding theorems for individual sequences , 1978, IEEE Trans. Inf. Theory.

[16]  Hosam M. Mahmoud,et al.  Evolution of random search trees , 1991, Wiley-Interscience series in discrete mathematics and optimization.

[17]  Guy Louchard Exact and Asymptotic Distributions in Digital and Binary Search Trees , 1987, RAIRO Theor. Informatics Appl..

[18]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[19]  Marcelo J. Weinberger,et al.  Upper bounds on the probability of sequences emitted by finite-state sources and on the redundancy of the Lempel-Ziv algorithm , 1992, IEEE Trans. Inf. Theory.

[20]  Philippe Flajolet,et al.  Generalized Digital Trees and Their Difference-Differential Equations , 1992, Random Struct. Algorithms.

[21]  Philippe Flajolet,et al.  General combinatorial schemas: Gaussian limit distributions and exponential tails , 1993, Discret. Math..

[22]  Philippe Jacquet,et al.  A Functional Equation Arising in the Analysis of Algorithms , 1994 .

[23]  D. Aldous,et al.  A diffusion limit for a class of randomly-growing binary trees , 1988 .

[24]  Philippe Flajolet,et al.  Digital Search Trees Revisited , 1986, SIAM J. Comput..

[25]  Paul C. Shields,et al.  Universal redundancy rates do not exist , 1993, IEEE Trans. Inf. Theory.

[26]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[27]  Philippe Jacquet Contribution de l'analyse d'algorithmes à l'évaluation de protocoles de communication , 1989 .