A Minimum Description Length Proposal for Lossy Data Compression

We give a development of the theory of lossy data compression from the point of view of statistics. This is partly motivated by the enormous success of the statistical approach in lossless compression, in particular Rissanen’s celebrated Minimum Description Length (MDL) principle. A precise characterization of the fundamental limits of compression performance is given, for arbitrary data sources and with respect to general distortion measures. The starting point for this development is the observation that there is a precise correspondence between compression algorithms and probability distributions (in analogy with the Kraft inequality in lossless compression). This leads us to formulate a version of the MDL principle for lossy data compression. We discuss the consequences of the lossy MDL principle and explain how it leads to potential practical design lessons for vector-quantizer design. We introduce two methods for selecting efficient compression algorithms, the lossy Maximum Likelihood Estimate (LMLE) and the lossy Minimum Description Length Estimate (LMDLE). We describe their theoretical performance and give examples illustrating how the LMDLE has superior performance to the LMLE.