Concentration of tempered posteriors and of their variational approximations

While Bayesian methods are extremely popular in statistics and machine learning, their application to massive datasets is often challenging, when possible at all. Indeed, the classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family. Thus, MCMC are replaced by an optimization algorithm which is orders of magnitude faster. VB methods have been applied in such computationally demanding applications as including collaborative filtering, image and video processing, NLP and text processing... However, despite very nice results in practice, the theoretical properties of these approximations are usually not known. In this paper, we propose a general approach to prove the concentration of variational approximations of fractional posteriors. We apply our theory to two examples: matrix completion, and Gaussian VB.

[1]  Chao Gao,et al.  Convergence rates of variational posterior distributions , 2017, The Annals of Statistics.

[2]  Pierre Alquier,et al.  A Bayesian approach for noisy matrix completion: Optimal rate under general sampling distribution , 2014, 1408.5820.

[3]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[4]  G. Lecu'e,et al.  Estimation bounds and sharp oracle inequalities of regularized procedures with Lipschitz loss functions , 2017, The Annals of Statistics.

[5]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[6]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[7]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[8]  A. Bhattacharya,et al.  Bayesian fractional posteriors , 2016, The Annals of Statistics.

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Peter Grünwald,et al.  Fast Rates with Unbounded Losses , 2016, ArXiv.

[11]  O. Klopp Noisy low-rank matrix completion with general sampling distribution , 2012, 1203.0108.

[12]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[13]  Aad van der Vaart,et al.  Fundamentals of Nonparametric Bayesian Inference , 2017 .

[14]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[15]  Taiji Suzuki,et al.  Convergence rate of Bayesian tensor estimator and its minimax optimality , 2015, ICML.

[16]  Annie Marsden,et al.  Sequential Matrix Completion , 2017, ArXiv.

[17]  Thijs van Ommen,et al.  Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It , 2014, 1412.3730.

[18]  N. Chopin,et al.  Bayesian matrix completion: prior specification , 2014, 1406.1440.

[19]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[20]  Yew Jin Lim Variational Bayesian Approach to Movie Rating Prediction , 2007 .

[21]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[22]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[23]  Pierre Alquier,et al.  An oracle inequality for quasi-Bayesian nonnegative matrix factorization , 2016, 1601.01345.

[24]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[25]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[26]  Neil D. Lawrence,et al.  Non-linear matrix factorization with Gaussian processes , 2009, ICML '09.

[27]  David M. Blei,et al.  Frequentist Consistency of Variational Bayes , 2017, Journal of the American Statistical Association.

[28]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[29]  Aggelos K. Katsaggelos,et al.  Low-rank matrix completion by variational sparse Bayesian learning , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[31]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[32]  Bo Wang,et al.  Convergence and Asymptotic Normality of Variational Bayesian Approximations for Expon , 2004, UAI.

[33]  J. Rousseau On the Frequentist Properties of Bayesian Nonparametric Methods , 2016 .

[34]  Xin Li,et al.  Patch-Based Video Processing: A Variational Bayesian Approach , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  L. Carin,et al.  Nonparametric Bayesian matrix completion , 2010, 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop.

[36]  Pierre Alquier,et al.  1-Bit matrix completion: PAC-Bayesian analysis of a variational approximation , 2016, Machine Learning.

[37]  Lawrence Carin,et al.  A nonparametric Bayesian model for kernel matrix completion , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  James Ridgway,et al.  Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation , 2015, 1506.08640.

[39]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[40]  Peter Grünwald,et al.  The Safe Bayesian - Learning the Learning Rate via the Mixability Gap , 2012, ALT.

[41]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .