From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation

We consider an extension of e-entropy to a KL-divergence based complexity measure for randomized density estimation methods. Based on this extension, we develop a general information-theoretical inequality that measures the statistical complexity of some deterministic and randomized density estimators. Consequences of the new inequality will be presented. In particular, we show that this technique can lead to improvements of some classical results concerning the convergence of minimum description length and Bayesian posterior distributions. Moreover, we are able to derive clean finite-sample convergence bounds that are not obtainable using previous approaches.

[1]  A. Rényi On Measures of Entropy and Information , 1961 .

[2]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[3]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[4]  J. Rissanen Stochastic Complexity in Statistical Inquiry Theory , 1989 .

[5]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[6]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[7]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[8]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[9]  Tong Zhang,et al.  Theoretical analysis of a class of randomized regularization methods , 1999, COLT '99.

[10]  L. Wasserman,et al.  The consistency of posterior distributions in nonparametric problems , 1999 .

[11]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[12]  A. Barron,et al.  Estimation of mixture models , 1999 .

[13]  S. Geer Empirical Processes in M-Estimation , 2000 .

[14]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[15]  L. Wasserman,et al.  Rates of convergence of posterior distributions , 2001 .

[16]  Simon Haykin,et al.  An Approach to Adaptive Classification , 2001 .

[17]  N. Hjort,et al.  On Bayesian consistency , 2001 .

[18]  Matthias Seeger,et al.  PAC-Bayesian Generalization Error Bounds for GaussianPro ess Classi ationMatthias , 2002 .

[19]  Tong Zhang,et al.  Learning Bounds for a Generalized Family of Bayesian Posterior Distributions , 2003, NIPS.

[20]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[21]  O. Catoni A PAC-Bayesian approach to adaptive classification , 2004 .