A Bernstein-Von Mises Theorem for discrete probability distributions

We investigate the asymptotic normality of the posterior distri- bution in the discrete setting, when model dimension increases with sample size. We consider a probability mass function �0 on \{0} and a sequence of truncation levels (kn)n satisfying k 3 nninfikn �0(i). Let ˆ � denote the maximum likelihood estimate of (�0(i))ikn and letn(�0) denote the kn- dimensional vector which i-th coordinate is defined by p n(ˆ �n(i) �0(i)) for 1 � ikn. We check that under mild conditions on �0 and on the sequence of prior probabilities on the kn-dimensional simplices, after centering and rescaling, the variation distance between the posterior distribution recen- tered around ˆ �n and rescaled by p n and the kn-dimensional Gaussian dis- tribution N(�n(�0),I 1 (�0)) converges in probability to 0. This theorem can be used to prove the asymptotic normality of Bayesian estimators of Shannon and Renyi entropies. The proofs are based on concentration inequalities for centered and non- centered Chi-square (Pearson) statistics. The latter allow to establish pos- terior concentration rates with respect to Fisher distance rather than with respect to the Hellinger distance as it is commonplace in non-parametric Bayesian statistics.

[1]  D. Freedman On the Asymptotic Behavior of Bayes' Estimates in the Discrete Case , 1963 .

[2]  L. Schwartz On Bayes procedures , 1965 .

[3]  R. Gallager Information Theory and Reliable Communication , 1968 .

[4]  John C. Kieffer,et al.  A unified approach to weak universal source coding , 1978, IEEE Trans. Inf. Theory.

[5]  D. Rubin The Bayesian Bootstrap , 1981 .

[6]  Albert Y. Lo,et al.  A large sample study of the Bayesian bootstrap , 1987 .

[7]  S. Portnoy Asymptotic Behavior of Likelihood Methods for Exponential Families when the Number of Parameters Tends to Infinity , 1988 .

[8]  Albert Y. Lo,et al.  A Bayesian bootstrap for a finite population , 1988 .

[9]  C. Weng,et al.  On a Second-Order Asymptotic Property of the Bayesian Bootstrap Mean , 1989 .

[10]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[13]  László Györfi,et al.  On Universal Noiseless Source Coding for Infinite Source Alphabets , 1993, Eur. Trans. Telecommun..

[14]  Jianqing Fan Local Linear Regression Smoothers and Their Minimax Efficiencies , 1993 .

[15]  Jianqing Fan,et al.  Nonparametric regression with errors in variables , 1993 .

[16]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[17]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[18]  Devdatt P. Dubhashi,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[19]  A. V. D. Vaart Asymptotic Statistics: Delta Method , 1998 .

[20]  S. Ghosal Asymptotic Normality of Posterior Distributions for Exponential Families when the Number of Parameters Tends to Infinity , 2000 .

[21]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[22]  A. Antos,et al.  Convergence properties of functional estimates for discrete distributions , 2001 .

[23]  A. V. D. Vaart,et al.  Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities , 2001 .

[24]  Jianqing Fan,et al.  Generalized likelihood ratio statistics and Wilks phenomenon , 2001 .

[25]  A. Vaart The statistical work of Lucien Le Cam , 2002 .

[26]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[27]  T. N. Sriram Asymptotics in Statistics–Some Basic Concepts , 2002 .

[28]  A. Tsybakov,et al.  Introduction à l'estimation non-paramétrique , 2003 .

[29]  Liam Paninski,et al.  Estimating entropy on m bins given fewer than m samples , 2004, IEEE Transactions on Information Theory.

[30]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions for non-i.i.d. observations , 2007, 0708.0491.

[31]  A. V. D. Vaart,et al.  Posterior convergence rates of Dirichlet mixtures at smooth densities , 2007, 0708.1885.

[32]  Aurélien Garivier,et al.  Coding on Countably Infinite Alphabets , 2008, IEEE Transactions on Information Theory.

[33]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .