Convergence rates of variational posterior distributions

We study convergence rates of variational posterior distributions for nonparametric and high-dimensional inference. We formulate general conditions on prior, likelihood, and variational class that characterize the convergence rates. Under similar "prior mass and testing" conditions considered in the literature, the rate is found to be the sum of two terms. The first term stands for the convergence rate of the true posterior distribution, and the second term is contributed by the variational approximation error. For a class of priors that admit the structure of a mixture of product measures, we propose a novel prior mass condition, under which the variational approximation error of the generalized mean-field class is dominated by convergence rate of the true posterior. We demonstrate the applicability of our general results for various models, prior distributions and variational classes by deriving convergence rates of the corresponding variational posteriors.

[1]  L. Schwartz On Bayes procedures , 1965 .

[2]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[3]  L. Wasserman,et al.  The consistency of posterior distributions in nonparametric problems , 1999 .

[4]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[5]  L. Wasserman,et al.  Rates of convergence of posterior distributions , 2001 .

[6]  N. Hjort,et al.  On Bayesian consistency , 2001 .

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[9]  Ole Winther,et al.  Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[10]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[11]  Tong Zhang From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation , 2006, math/0702653.

[12]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions for non-i.i.d. observations , 2007, 0708.0491.

[13]  S. Walker,et al.  On rates of convergence for posterior distributions in infinite-dimensional models , 2007, 0708.1892.

[14]  I. Castillo Lower bounds for posterior rates with Gaussian process priors , 2008, 0807.2734.

[15]  Michael I. Jordan,et al.  Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes , 2008, NIPS.

[16]  G. Winkler,et al.  Complexity Penalized M-Estimation , 2008 .

[17]  Van Der Vaart,et al.  Rates of contraction of posterior distributions based on Gaussian process priors , 2008 .

[18]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[19]  A. V. D. Vaart,et al.  Adaptive Bayesian density estimation with location-scale mixtures , 2010 .

[20]  Xiangyu Chang,et al.  Asymptotic Normality of Maximum Likelihood and its Variational Approximation for Stochastic Blockmodels , 2012, ArXiv.

[21]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[22]  Judith Rousseau,et al.  Posterior concentration rates for infinite dimensional exponential families , 2012 .

[23]  Chao Gao,et al.  Rate exact Bayesian adaptation with modified block priors , 2013, 1312.3937.

[24]  C. Maugis-Rabusseau,et al.  Adaptive density estimation for clustering with gaussian mixtures , 2013 .

[25]  M. Stephens,et al.  fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets , 2014, Genetics.

[26]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[27]  I. Castillo On Bayesian supremum norm contraction rates , 2013, 1304.1761.

[28]  Harrison H. Zhou,et al.  Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[29]  Harrison H. Zhou,et al.  A general framework for Bayes structured linear models , 2015, The Annals of Statistics.

[30]  Judith Rousseau,et al.  On adaptive posterior concentration rates , 2013, 1305.5270.

[31]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[32]  Judith Rousseau,et al.  Asymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimator , 2015, 1504.04814.

[33]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[34]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[35]  Minimax Risk Bounds for Piecewise Constant Models , 2017 .

[36]  Cun-Hui Zhang,et al.  Minimax Risk Bounds for Piecewise Constant Models , 2017 .

[37]  Anderson Y. Zhang,et al.  Theoretical and Computational Guarantees of Mean Field Variational Inference for Community Detection , 2017, The Annals of Statistics.

[38]  Debdeep Pati,et al.  $\alpha $-variational inference with statistical guarantees , 2017, The Annals of Statistics.

[39]  Pierre Alquier,et al.  Concentration of tempered posteriors and of their variational approximations , 2017, The Annals of Statistics.

[40]  David M. Blei,et al.  Frequentist Consistency of Variational Bayes , 2017, Journal of the American Statistical Association.

[41]  A. V. D. Vaart,et al.  CONVERGENCE RATES OF POSTERIOR DISTRIBUTIONS FOR NONIID OBSERVATIONS By , 2018 .

[42]  Yun Yang,et al.  On Statistical Optimality of Variational Bayes , 2018, AISTATS.

[43]  Pierre Alquier,et al.  Consistency of variational Bayes inference for estimation and model selection in mixtures , 2018, 1805.05054.