Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the target distribution, by applying a form of functional gradient descent that minimizes the KL divergence. Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods. The derivation of our method is based on a new theoretical result that connects the derivative of KL divergence under smooth transforms with Stein's identity and a recently proposed kernelized Stein discrepancy, which is of independent interest.

[1]  J. Gillis,et al.  Probability and Related Topics in Physical Sciences , 1960 .

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Neil D. Lawrence,et al.  Approximating Posterior Distributions in Belief Networks Using Mixtures , 1997, NIPS.

[4]  T. Jaakkola,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[5]  Michael I. Jordan,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[6]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[7]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[8]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[9]  P. Diaconis,et al.  Use of exchangeable pairs in the analysis of simulations , 2004 .

[10]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[11]  Siwei Lyu,et al.  Interpretation and Generalization of Score Matching , 2009, UAI.

[12]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[13]  David Barber,et al.  Affine Independent Variational Inference , 2012, NIPS.

[14]  David M. Blei,et al.  Nonparametric variational inference , 2012, ICML.

[15]  Pierre Del Moral,et al.  Mean Field Simulation for Monte Carlo Integration , 2013 .

[16]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[17]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[18]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[19]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[20]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[21]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[22]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[23]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[24]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[25]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[26]  Edoardo M. Airoldi,et al.  Copula variational inference , 2015, NIPS.

[27]  Richard E. Turner,et al.  Stochastic Expectation Propagation , 2015, NIPS.

[28]  David B. Dunson,et al.  Variational Gaussian Copula Inference , 2015, AISTATS.

[29]  Richard E. Turner,et al.  Variational Inference with Rényi Divergence , 2016, ArXiv.

[30]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[31]  Y. Marzouk,et al.  An introduction to sampling via measure transport , 2016, 1602.05023.

[32]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[33]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[34]  Le Song,et al.  Provable Bayesian Inference via Particle Mirror Descent , 2015, AISTATS.

[35]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[36]  Ardavan Saeedi,et al.  Variational Particle Approximations , 2014, J. Mach. Learn. Res..