Summary form only given. A fundamental problem in the construction of statistical techniques for data compression of sequential text is the generation of probabilities from counts of previous occurrences. Each context used in the statistical model accumulates counts of the number of times each symbol has occurred in that context. So in a binary alphabet there will be two counts C/sub 0/ and C/sub 1/ (the number of times a 0 or 1 has occurred). The problem then is to take the counts and generate from them a probability that the next character will be a 0 or 1. A naive estimate of the probability of character i could be obtained by the ratio p/sub i/=C/sub i//(C/sub 0/+C/sub i/). A fundamental problem with this is that it will generate a zero probability if C/sub 0/ or C/sub 1/ is zero. Unfortunately, a zero probability prevents coding from working correctly as the "optimum" code length in this case is infinite. Consequently any estimate of the probabilities must be non-zero even in the presence of zero counts. This problem is called the zero frequency problem . A well known solution to the problem was formulated by Laplace and is known as Laplace's law of succession. We have investigated the correctness of Laplace's law by experiment.
[1]
Michael Gates Roberts,et al.
Local order estimating Markovian analysis for noiseless source coding and authorship identification
,
1982
.
[2]
Ian H. Witten,et al.
A comparison of enumerative and adaptive codes
,
1984,
IEEE Trans. Inf. Theory.
[3]
Alistair Moffat,et al.
Implementing the PPM data compression scheme
,
1990,
IEEE Trans. Commun..
[4]
Ian H. Witten,et al.
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression
,
1991,
IEEE Trans. Inf. Theory.
[5]
Paul G. Howard,et al.
The design and analysis of efficient lossless data compression systems
,
1993
.
[6]
John G. Cleary,et al.
Unbounded Length Contexts for PPM
,
1997
.