Variational Inference with Rényi Divergence

We introduce the variational R\'enyi bound (VR) that extends traditional variational inference to R\'enyi's alpha-divergences. This new family of variational lower-bounds unifies a number of existing variational methods, and enables a smooth interpolation from the evidence lower-bound to the log (marginal) likelihood that is controlled by the value of alpha. The reparameterization trick, Monte Carlo estimation and stochastic optimisation methods are deployed to obtain a unified implementation for the VR bound optimisation. We further consider negative alpha values and propose a novel variational inference method as a new special case in the proposed framework. Experiments on variational auto-encoders and Bayesian neural networks demonstrate the wide applicability of the VR bound.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[3]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[4]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[5]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[6]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[7]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[10]  Yoshua Bengio,et al.  Denoising Criterion for Variational Auto-Encoding Framework , 2015, AAAI.

[11]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[12]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[15]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[16]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[17]  Guillaume P. Dehaene,et al.  Expectation propagation in the large data limit , 2015, 1503.08060.

[18]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[19]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[20]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[21]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[22]  Ole Winther,et al.  Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[23]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[24]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[25]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[26]  Yee Whye Teh,et al.  Distributed Bayesian Posterior Sampling via Moment Sharing , 2014, NIPS.

[27]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[28]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[29]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.

[30]  Frank Nielsen,et al.  Chernoff information of exponential families , 2011, ArXiv.

[31]  Richard E. Turner,et al.  Stochastic Expectation Propagation , 2015, NIPS.