We give a development of the theory of lossy data compression from the point of view of statistics. This is partly motivated by the enormous success of the statistical approach in lossless compression, in particular Rissanen’s celebrated Minimum Description Length (MDL) principle. A precise characterization of the fundamental limits of compression performance is given, for arbitrary data sources and with respect to general distortion measures. The starting point for this development is the observation that there is a precise correspondence between compression algorithms and probability distributions (in analogy with the Kraft inequality in lossless compression). This leads us to formulate a version of the MDL principle for lossy data compression. We discuss the consequences of the lossy MDL principle and explain how it leads to potential practical design lessons for vector-quantizer design. We introduce two methods for selecting efficient compression algorithms, the lossy Maximum Likelihood Estimate (LMLE) and the lossy Minimum Description Length Estimate (LMDLE). We describe their theoretical performance and give examples illustrating how the LMDLE has superior performance to the LMLE.
[1]
Zhen Zhang,et al.
On the Redundancy of Lossy Source Coding with Abstract Alphabets
,
1999,
IEEE Trans. Inf. Theory.
[2]
Amir Dembo,et al.
Source coding, large deviations, and approximate pattern matching
,
2001,
IEEE Trans. Inf. Theory.
[3]
Junshan Zhang,et al.
Arbitrary source models and Bayesian codebooks in rate-distortion theory
,
2002,
IEEE Trans. Inf. Theory.
[4]
Matthew T. Harrison.
The Convergence of Lossy Maximum Likelihood Estimators
,
2003
.
[5]
Ioannis Kontoyiannis,et al.
Second-order properties of lossy likelihoods and the MLE/MDL dichotomy in lossy compression
,
2005
.