Wald Lecture I: Counting Bits with Kolmogorov and Shannon

Shannon’s Rate-Distortion Theory describes the number of bits needed to approximately represent typical realizations of a stochastic process X = (X(t) : t ∈ T ), while Kolmogorov’s ǫ-entropy describes the number of bits needed to approximately represent an arbitrary member f = (f(t) : t ∈ T ) of a functional class F . For many stochastic processes a great deal is known about the behavior of the rate distortion function, while for few functional classes F has there been success in determining, say, the precise asymptotics of the ǫ-entropy. Let W 2,0(γ) denote the class of functions f(t) on T = [0, 2π) with periodic boundary conditions and 1 2π ∫ 2π 0 f(t)dt + 1 2π ∫ 2π 0 f (t)dt ≤ γ. We show that for approximating functions of this class in L norm we have the precise asymptotics of the Kolmogorov ǫentropy: Hǫ(W m 2,0(γ)) ∼ 2m(log2 e)(γ/2ǫ) , ǫ → 0. (0.1) This follows from a connection between the Shannon and Kolmogorov theories, which allows us to exploit the powerful formalism of Shannon’s Rate-Distortion theory to obtain information about the Kolmogorov ǫ-entropy. In fact, the Kolmogorov ǫ-entropy is asymptotically equivalent, as ǫ → 0, to the maximum Rate-Distortion R(D,X) over all stochastic processes X with sample paths in W 2,0(γ), where we make the calibration D = ǫ . There is a family of Gaussian processes X∗ D which asymptotically, as D → 0, take realizations in W 2,0(γ), and for which the process at index D has essentially the highest rate-distortion R(D,X) of all processes X living in W 2,0(γ). We evaluate the rate-distortion function of members of this family, giving formula (0.1). These results strongly parallel a key result in modern statistical decision theory, Pinsker’s theorem. This points to a connection between theories of statistical estimation and data compression, which will be the theme of these Lectures.

[1]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[2]  Andrei N. Kolmogorov,et al.  On the Shannon theory of information transmission in the case of continuous signals , 1956, IRE Trans. Inf. Theory.

[3]  C. A. Rogers Covering a sphere with spheres , 1963 .

[4]  A. Wyner Random packings and coverings of the unit n-sphere , 1967 .

[5]  R. Dudley The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .

[6]  David J. Sakrison,et al.  A geometric treatment of the source encoding of a Gaussian random variable , 1968, IEEE Trans. Inf. Theory.

[7]  DAVID J. SAKRISON,et al.  The Rate Distortion Function for a Class of Sources , 1969, Inf. Control..

[8]  David J. Sakrison,et al.  The rate of a class of random processes , 1970, IEEE Trans. Inf. Theory.

[9]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[10]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[11]  David J. Sakrison,et al.  Worst sources and robust codes for difference distortion measures , 1975, IEEE Trans. Inf. Theory.

[12]  Lucien Birgé Approximation dans les espaces métriques et théorie de l'estimation , 1983 .

[13]  G. Pisier The volume of convex bodies and Banach space geometry , 1989 .

[14]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  V. Tikhomirov Some Fundamental Problems in the Approximate and Exact Representation of Functions of One or Several Variables , 1991 .

[17]  V. Tikhomirov,et al.  ε-Entropy and ε-Capacity , 1993 .

[18]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[19]  J. Graaf Book review: Function spaces, entropy numbers and differential operators , 1998 .

[20]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .