Variational Bayes on manifolds

Variational Bayes (VB) has become a widely-used tool for Bayesian inference in statistics and machine learning. Nonetheless, the development of the existing VB algorithms is so far generally restricted to the case where the variational parameter space is Euclidean, which hinders the potential broad application of VB methods. This paper extends the scope of VB to the case where the variational parameter space is a Riemannian manifold. We develop an efficient manifold-based VB algorithm that exploits both the geometric structure of the constraint parameter space and the information geometry of the manifold of VB approximating probability distributions. Our algorithm is provably convergent and achieves a convergence rate of order $\mathcal O(1/\sqrt{T})$ and $\mathcal O(1/T^{2-2\epsilon})$ for a non-convex evidence lower bound function and a strongly retraction-convex evidence lower bound function, respectively. We develop in particular two manifold VB algorithms, Manifold Gaussian VB and Manifold Neural Net VB, and demonstrate through numerical experiments that the proposed algorithms are stable, less sensitive to initialization and compares favourably to existing VB methods.

[1]  K. Mardia,et al.  Maximum likelihood estimation of models for residual covariance in spatial regression , 1984 .

[2]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[3]  D. Nott,et al.  Gaussian Variational Approximation With a Factor Covariance Structure , 2017, Journal of Computational and Graphical Statistics.

[4]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[5]  Soumava Kumar Roy,et al.  Constrained Stochastic Gradient Descent: The Good Practice , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[6]  Wen Huang,et al.  A Riemannian symmetric rank-one trust-region method , 2014, Mathematical Programming.

[7]  Hiroyuki Kasai,et al.  Adaptive stochastic gradient algorithms on Riemannian manifolds , 2019, ICML 2019.

[8]  Hiroyuki Sato,et al.  Cholesky QR-based retraction on the generalized Stiefel manifold , 2018, Computational Optimization and Applications.

[9]  N. Longford A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects , 1987 .

[10]  Loring W. Tu,et al.  An introduction to manifolds , 2007 .

[11]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[12]  Wen Huang,et al.  A Broyden Class of Quasi-Newton Methods for Riemannian Optimization , 2015, SIAM J. Optim..

[13]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[14]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[15]  Edoardo M. Airoldi,et al.  Copula variational inference , 2015, NIPS.

[16]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[17]  K. Nomizu,et al.  Foundations of Differential Geometry , 1963 .

[18]  Max Welling,et al.  Sylvester Normalizing Flows for Variational Inference , 2018, UAI.

[19]  Gary Bécigneul,et al.  Riemannian Adaptive Optimization Methods , 2018, ICLR.

[20]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[21]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[22]  Junbin Gao,et al.  Manifold Optimization-Assisted Gaussian Variational Approximation , 2019, J. Comput. Graph. Stat..

[23]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[24]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[25]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[26]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[27]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[28]  Minh-Ngoc Tran,et al.  Bayesian Deep Net GLM and GLMM , 2018, Journal of Computational and Graphical Statistics.

[29]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[30]  James Martens,et al.  New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..

[31]  Christopher C. Drovandi,et al.  Variational Bayes with synthetic likelihood , 2016, Statistics and Computing.

[32]  R. Kohn,et al.  Regression Density Estimation With Variational Methods and Stochastic Approximation , 2012 .

[33]  David J. Nott,et al.  High-Dimensional Copula Variational Approximation Through Transformation , 2019 .

[34]  M. Wand,et al.  Explaining Variational Approximations , 2010 .

[35]  David J. Nott,et al.  Variational Bayes With Intractable Likelihood , 2015, 1503.08621.

[36]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[37]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[38]  Jonathan H. Manton,et al.  Optimization algorithms exploiting unitary constraints , 2002, IEEE Trans. Signal Process..