Upper Bounds On The Probability Of Sequences Emitted By Finite-state Sources And On The Redundancy Of The Lempel-Ziv Algorithm

An upper bound on the probability of a sequence drawn from a finite-state source is derived. The bound is given in terms of the number of phrases obtained by parsing the sequence according to the Lempel-Ziv (L-Z) incremental parsing rule, and is universal in the sense that it does not depend on the statistical parameters that characterize the source. This bound is used to derive an upper bound on the redundance of the L-Z universal data compression algorithm applied to finite-state sources, that depends on the length N of the sequence, on the number K of states of the source, and, eventually, on the source entropy. A variation of the L-Z algorithm is presented, and an upper bound on its redundancy is derived for finite-state sources. A method to derive tighter implicit upper bounds on the redundancy of both algorithms is also given, and it is shown that for the proposed variation this bound is smaller than for the original L-Z algorithm, or every value of N and K. >