Overdispersed Black-Box Variational Inference

We introduce overdispersed black-box variational inference, a method to reduce the variance of the Monte Carlo estimator of the gradient in black-box variational inference. Instead of taking samples from the variational distribution, we use importance sampling to take samples from an overdispersed distribution in the same exponential family as the variational approximation. Our approach is general since it can be readily applied to any exponential family distribution, which is the typical choice for the variational approximation. We run experiments on two non-conjugate probabilistic models to show that our method effectively reduces the variance, and the overhead introduced by the computation of the proposal parameters and the importance weights is negligible. We find that our overdispersed importance sampling scheme provides lower variance than black-box variational inference, even when the latter uses twice the number of samples. This results in faster convergence of the black-box inference procedure.

[1]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[2]  David Luengo,et al.  Generalized Multiple Importance Sampling , 2015, Statistical Science.

[3]  Miguel Lázaro-Gredilla,et al.  Local Expectation Gradients for Black Box Variational Inference , 2015, NIPS.

[4]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[5]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[6]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[7]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[8]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[9]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[10]  David M. Blei,et al.  Deep Exponential Families , 2014, AISTATS.

[11]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[12]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[13]  David Wingate,et al.  Automated Variational Inference in Probabilistic Programming , 2013, ArXiv.

[14]  David Tolpin,et al.  Black-Box Policy Search with Probabilistic Programs , 2015, AISTATS.

[15]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[16]  H. Robbins A Stochastic Approximation Method , 1951 .

[17]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[18]  Robert Price,et al.  A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.

[19]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[20]  G. Casella,et al.  Rao-Blackwellisation of sampling schemes , 1996 .

[21]  Robert Price,et al.  Comments on 'A Useful Theorem for Nonlinear Devices Having Gaussian Inputs' by Robert Price , 1964, IEEE Trans. Inf. Theory.

[22]  Leonidas J. Guibas,et al.  Optimally combining sampling techniques for Monte Carlo rendering , 1995, SIGGRAPH.

[23]  Guillaume Bouchard,et al.  Online Learning to Sample , 2015, 1506.09016.

[24]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[25]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[26]  Jack P. C. Kleijnen,et al.  Optimization and Sensitivity Analysis of Computer Simulation Models by the Score Function Method , 1996 .

[27]  Sergey Levine,et al.  MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.

[28]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[29]  Tim Salimans,et al.  Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.

[30]  B. Jørgensen Exponential Dispersion Models , 1987 .

[31]  A. Owen,et al.  Safe and Effective Importance Sampling , 2000 .

[32]  T. Hesterberg,et al.  Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .

[33]  Matthew King,et al.  A Stochastic approximation method for inference in probabilistic graphical models , 2009, NIPS.

[34]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[35]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[36]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[37]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[38]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[39]  G. Bonnet Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire , 1964 .

[40]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[41]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[42]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.