Information-theoretic asymptotics of Bayes methods

In the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. The authors examine the relative entropy distance D/sub n/ between the true density and the Bayesian density and show that the asymptotic distance is (d/2)(log n)+c, where d is the dimension of the parameter vector. Therefore, the relative entropy rate D/sub n//n converges to zero at rate (log n)/n. The constant c, which the authors explicitly identify, depends only on the prior density function and the Fisher information matrix evaluated at the true parameter value. Consequences are given for density estimation, universal data compression, composite hypothesis testing, and stock-market portfolio selection. >

[1]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  A. Wald Tests of statistical hypotheses concerning several parameters when the number of observations is large , 1943 .

[4]  A. Wald Note on the Consistency of the Maximum Likelihood Estimate , 1949 .

[5]  J. Wolfowitz On Wald's Proof of the Consistency of the Maximum Likelihood Estimate , 1949 .

[6]  Kai Lai Chung,et al.  A Course in Probability Theory , 1949 .

[7]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[8]  H. Chernoff On the Distribution of the Likelihood Ratio , 1954 .

[9]  H. Chernoff LARGE-SAMPLE THEORY: PARAMETRIC CASE' , 1956 .

[10]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[11]  W. Hoeffding,et al.  Distinguishability of Sets of Distributions , 1958 .

[12]  N. D. Bruijn Asymptotic methods in analysis , 1958 .

[13]  J. Kiefer,et al.  On the deviations of the empiric distribution function of vector chance variables , 1958 .

[14]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[15]  Richard Von Mises,et al.  Mathematical Theory of Probability and Statistics , 1966 .

[16]  L. Schwartz On Bayes procedures , 1965 .

[17]  R. Berk,et al.  Limiting Behavior of Posterior Distributions when the Model is Incorrect , 1966 .

[18]  Richard A. Johnson Asymptotic Expansions Associated with the $n$th Power of a Density , 1967 .

[19]  P. Bickel,et al.  Some contributions to the asymptotic theory of Bayes solutions , 1969 .

[20]  A. M. Walker On the Asymptotic Behaviour of Posterior Distributions , 1969 .

[21]  R. Berk,et al.  CONSISTENCY A POSTERIORI , 1970 .

[22]  M. Degroot Optimal Statistical Decisions , 1970 .

[23]  Richard A. Johnson Asymptotic Expansions Associated with Posterior Distributions , 1970 .

[24]  David Lindley,et al.  Optimal Statistical Decisions , 1971 .

[25]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[26]  J. Kieffer A Counterexample to Perez's Generalization of the Shannon-McMillan Theorem , 1973 .

[27]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[28]  J. Kieffer A SIMPLE PROOF OF THE MOY-PEREZ GENERALIZATION OF THE SHANNON-MCMILLAN THEOREM , 1974 .

[29]  J. Aitchison Goodness of prediction fit , 1975 .

[30]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[31]  H. Strasser Consistency of Maximum Likelihood and Bayes Estimates , 1981 .

[32]  R. Z. Khasʹminskiĭ,et al.  Statistical estimation : asymptotic theory , 1981 .

[33]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[34]  Tom Leonard A Simple Predictive Density Function: Comment , 1982 .

[35]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[36]  D. Pollard New Ways to Prove Central Limit Theorems , 1985, Econometric Theory.

[37]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[38]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[39]  S. Stigler Laplace's 1774 Memoir on Inverse Probability , 1986 .

[40]  R. R. Bahadur Some Limit Theorems in Statistics , 1987 .

[41]  J. Rissanen Stochastic complexity and the mdl principle , 1987 .

[42]  A. Barron Are Bayes Rules Consistent in Information , 1987 .

[43]  S. Kullback,et al.  Topics in statistical information theory , 1987 .

[44]  Robert M. Gray,et al.  Probability, Random Processes, And Ergodic Properties , 1987 .

[45]  Richard E. Blahut,et al.  Principles and practice of information theory , 1987 .

[46]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[47]  Andrew R. Barron,et al.  A bound on the financial value of information , 1988, IEEE Trans. Inf. Theory.

[48]  B. Clarke Asymptotic cumulative risk and Bayes risk under entropy loss, with applications , 1989 .

[49]  A. Barron Uniformly Powerful Goodness of Fit Tests , 1989 .

[50]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.