On Statistical Optimality of Variational Bayes

The article addresses a long-standing open problem on the justification of using variational Bayes methods for parameter estimation. We provide general conditions for obtaining optimal risk bounds for point estimates acquired from mean-field variational Bayesian inference. The conditions pertain to the existence of certain test functions for the distance metric on the parameter space and minimal assumptions on the prior. A general recipe for verification of the conditions is outlined which is broadly applicable to existing Bayesian models with or without latent variables. As illustrations, specific applications to Latent Dirichlet Allocation and Gaussian mixture models are discussed.

[1]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[2]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[3]  Naonori Ueda,et al.  Bayesian model search for mixture models based on optimizing variational bounds , 2002, Neural Networks.

[4]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[5]  Debdeep Pati,et al.  Posterior contraction in sparse Bayesian factor models for massive covariance matrices , 2012, 1206.3627.

[6]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[7]  L. Schwartz On Bayes procedures , 1965 .

[8]  Judith Rousseau,et al.  Asymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimator , 2015, 1504.04814.

[9]  J. Ghosh,et al.  Posterior consistency for semi-parametric regression problems , 2003 .

[10]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[11]  David B. Dunson,et al.  Posterior consistency in conditional distribution estimation , 2013, J. Multivar. Anal..

[12]  Pierre Alquier,et al.  Concentration of tempered posteriors and of their variational approximations , 2017, The Annals of Statistics.

[13]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[14]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[15]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[16]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[17]  Alexander J. Smola,et al.  Scalable inference in latent variable models , 2012, WSDM '12.

[18]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[19]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[22]  D. Titterington,et al.  Approximate Bayesian inference for simple mixtures , 2000 .

[23]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[24]  A. V. D. Vaart,et al.  Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities , 2001 .

[25]  Chao Gao,et al.  Rate-optimal posterior contraction for sparse PCA , 2013, 1312.0142.

[26]  M. Stephens Dealing with label switching in mixture models , 2000 .

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[29]  Guang Cheng,et al.  Optimal Bayesian estimation in random covariate design with a rescaled Gaussian process prior , 2014, J. Mach. Learn. Res..

[30]  Jiahua Chen Optimal Rate of Convergence for Finite Mixture Models , 1995 .

[31]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[32]  J. Rousseau On the Frequentist Properties of Bayesian Nonparametric Methods , 2016 .

[33]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.

[34]  Lucien Birgé Approximation dans les espaces métriques et théorie de l'estimation , 1983 .

[35]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[36]  David B. Dunson,et al.  Minimax Optimal Bayesian Aggregation , 2014 .

[37]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[38]  Nhat Ho,et al.  On strong identifiability and convergence rates of parameter estimation in finite mixtures , 2016 .