General bounds on the mutual information between a parameter and n conditionally independent observations

Each parameter in an abstract parameter space is associated with a di erent probability distribution on a set Y A parameter is chosen at random from ac cording to some a priori distribution on and n condi tionally independent random variables Y n Y Yn are observed with common distribution determined by We obtain bounds on the mutual information be tween the random variable giving the choice of pa rameter and the random variable Y n giving the se quence of observations We also bound the supremum of the mutual information over choices of the prior dis tribution on These quantities have applications in density estimation computational learning theory uni versal coding hypothesis testing and portfolio selection theory The bounds are given in terms of the metric and information dimensions of the parameter space with respect to the Hellinger distance

[1]  A. Rényi On the dimension and entropy of probability distributions , 1959 .

[2]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[3]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[4]  K. Symanzik Proof and Refinements of an Inequality of Feynman , 1965 .

[5]  E. Posner,et al.  Epsilon entropy of stochastic processes. , 1967 .

[6]  E. Posner,et al.  Epsilon Entropy of Gaussian Processes , 1969 .

[7]  E. Posner,et al.  Epsilon Entropy and Data Compression , 1971 .

[8]  E. Posner,et al.  Epsilon entropy of probability distributions , 1972 .

[9]  J. Yorke,et al.  Dimension of chaotic attractors , 1982 .

[10]  R. Dudley A course on empirical processes , 1984 .

[11]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[12]  A. Barron THE STRONG ERGODIC THEOREM FOR DENSITIES: GENERALIZED SHANNON-MCMILLAN-BREIMAN THEOREM' , 1985 .

[13]  L. Birge,et al.  On estimating a density using Hellinger distance and some other strange facts , 1986 .

[14]  L. L. Cam,et al.  Asymptotic methods in statistical theory , 1986 .

[15]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[16]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[17]  B. Clarke Asymptotic cumulative risk and Bayes risk under entropy loss, with applications , 1989 .

[18]  I. Ibragimov,et al.  On density estimation in the view of Kolmogorov's ideas in approximation theory , 1990 .

[19]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[20]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[21]  Kenji Yamanishi,et al.  A learning criterion for stochastic rules , 1990, COLT '90.

[22]  David Haussler,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  Michael Kearns,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[25]  David Haussler,et al.  HOW WELL DO BAYES METHODS WORK FOR ON-LINE PREDICTION OF {+- 1} VALUES? , 1992 .

[26]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[27]  P. Massart,et al.  Rates of convergence for minimum contrast estimators , 1993 .

[28]  D. Haussler,et al.  Information Bounds for the Risk of Bayesian Predictions and the Redundancy of Universal Codes , 1993, Proceedings. IEEE International Symposium on Information Theory.

[29]  S. Geer Hellinger-Consistency of Certain Nonparametric Maximum Likelihood Estimators , 1993 .

[30]  Shun-ichi Amari,et al.  Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[31]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[32]  Amir Dembo,et al.  The rate-distortion dimension of sets and measures , 1994, IEEE Trans. Inf. Theory.

[33]  David Haussler,et al.  Rigorous Learning Curve Bounds from Statistical Mechanics , 1994, COLT.

[34]  David Haussler,et al.  Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[35]  M. Opper Mutual Information and Bayes Methods for Learning a Distribution , 1995 .

[36]  M. Opper GENERAL BOUNDS FOR PREDICTIVE ERRORS IN SUPERVISED LEARNING , 1995 .

[37]  D. Haussler,et al.  Rigorous learning curve bounds from statistical mechanics , 1996 .

[38]  David Haussler,et al.  A general minimax result for relative entropy , 1997, IEEE Trans. Inf. Theory.