A strong version of the redundancy-capacity theorem of universal coding

The capacity of the channel induced by a given class of sources is well known to be an attainable lower bound on the redundancy of universal codes with respect to this class, both in the minimax sense and in the Bayesian (maximin) sense. We show that this capacity is essentially a lower bound also in a stronger sense, that is, for "most" sources in the class. This result extends Rissanen's (1984, 1986) lower bound for parametric families. We demonstrate the applicability of this result in several examples, e.g., parametric families with growing dimensionality, piecewise-fixed sources, arbitrarily varying sources, and noisy samples of learnable functions. Finally, we discuss implications of our results to statistical inference. >

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  Demetrios Kazakos Robust noiseless source coding through a game theoretical approach , 1983, IEEE Trans. Inf. Theory.

[3]  Lee D. Davisson,et al.  Minimax noiseless universal coding for Markov sources , 1983, IEEE Trans. Inf. Theory.

[4]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[5]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[6]  Anselm Blumer Minimax universal noiseless coding for unifilar and Markov sources , 1987, IEEE Trans. Inf. Theory.

[7]  Jorma Rissanen,et al.  Complexity of strings in the class of Markov sources , 1986, IEEE Trans. Inf. Theory.

[8]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[9]  Alberto Leon-Garcia,et al.  A source matching approach to finding minimax codes , 1980, IEEE Trans. Inf. Theory.

[10]  Neri Merhav,et al.  On the minimum description length principle for sources with piecewise constant parameters , 1993, IEEE Trans. Inf. Theory.

[11]  David Haussler,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[12]  Michael B. Pursley,et al.  Efficient universal noiseless source codes , 1981, IEEE Trans. Inf. Theory.

[13]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[14]  Neri Merhav,et al.  Optimal sequential probability assignment for individual sequences , 1994, IEEE Trans. Inf. Theory.

[15]  Neri Merhav,et al.  Universal coding for arbitrarily varying sources , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[16]  Wen-Chen Chen,et al.  On total boundedness for existence of weakly minimax universal codes , 1981, IEEE Trans. Inf. Theory.

[17]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[18]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[19]  David Haussler,et al.  HOW WELL DO BAYES METHODS WORK FOR ON-LINE PREDICTION OF {+- 1} VALUES? , 1992 .