Coding and Compression: A Happy Union of Theory and Practice

The mathematical theory behind coding and compression began a little more than 50 years ago with the publication of Claude Shannon's (1948) "A Mathematical Theory of Communication" in the Bell Systems Technical Journal. This article laid the foundation for what is now known as information theory in a mathematical framework that is probabilistic (see, e.g., Cover and Thomas 1991; Verdui 1998); that is, Shannon modeled the signal or message process by a random process and a communication channel by a random transition matrix that may distort the message. In the five decades that followed, information theory provided fundamental limits for communication in general and coding and compression in particular. These limits, predicted by information theory under probabilistic models, are now being approached in real products such as computer modems. Because these limits or fundamental communication quantities, such as entropy and channel capacity, vary from signal process to signal process or from channel to channel, they must be estimated for each communication setup. In this sense, information theory is intrinsically statistical. Moreover, the algorithmic theory of information has inspired an extension of Shannon's ideas that provides a formal measure of information of the kind long sought in statistical inference and modeling. This measure has led to the minimum description length (MDL) principle for modeling in general and model selection in particular (Barron, Rissanen, and Yu 1998; Hansen and Yu 1998; Rissanen 1978, 1989). A coding or compression algorithm is used when one surfs the web, listens to a CD, uses a cellular phone, or works on a computer. In particular, when a music file is downloaded through the internet, a losslessly compressed file (often having a much smaller size) is transmitted instead of the original file. Lossless compression works because the music signal is statistically redundant, and this redundancy can be removed through statistical prediction. For digital signals, integer prediction can be easily done based on the past signals that are available to both the sender and receiver, and so we need to transmit only the residuals from the prediction. These residuals can be coded at a much lower rate than the original signal (see, e.g., Edler, Huang, Schuller, and Yu 2000).

[1]  Jorma Rissanen,et al.  A multiplication-free multialphabet arithmetic code , 1989, IEEE Trans. Commun..

[2]  Sergio Verdú,et al.  Fifty Years of Shannon Theory , 1998, IEEE Trans. Inf. Theory.

[3]  Toby Berger,et al.  Lossy Source Coding , 1998, IEEE Trans. Inf. Theory.

[4]  Antonio Ortega,et al.  Image subband coding using context-based classification and adaptive quantization , 1999, IEEE Trans. Image Process..

[5]  Michael T. Orchard,et al.  Image coding based on mixture modeling of wavelet coefficients and a fast estimation-quantization framework , 1997, Proceedings DCC '97. Data Compression Conference.

[6]  Guillermo Sapiro,et al.  The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS , 2000, IEEE Trans. Image Process..

[7]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[8]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[9]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[10]  Myron Tribus Thirty years of information theory , 1983 .

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Jorma Rissanen,et al.  Generalized Kraft Inequality and Arithmetic Coding , 1976, IBM J. Res. Dev..

[13]  Prakash Narayan,et al.  Reliable Communication Under Channel Uncertainty , 1998, IEEE Trans. Inf. Theory.

[14]  Bin Yu,et al.  Perceptual audio coding using adaptive pre- and post-filters and lossless compression , 2002, IEEE Trans. Speech Audio Process..

[15]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[16]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[17]  Shlomo Shamai,et al.  Fading Channels: Information-Theoretic and Communication Aspects , 1998, IEEE Trans. Inf. Theory.