Generative Particle Variational Inference via Estimation of Functional Gradients

Recently, particle-based variational inference (ParVI) methods have gained interest because they directly minimize the Kullback-Leibler divergence and do not suffer from approximation errors from the evidence-based lower bound. However, many ParVI approaches do not allow arbitrary sampling from the posterior, and the few that do allow such sampling suffer from suboptimality. This work proposes a new method for learning to approximately sample from the posterior distribution. We construct a neural sampler that is trained with the functional gradient of the KL-divergence between the empirical sampling distribution and the target distribution, assuming the gradient resides within a reproducing kernel Hilbert space. Our generative ParVI (GPVI) approach maintains the asymptotic performance of ParVI methods while offering the flexibility of a generative sampler. Through carefully constructed experiments, we show that GPVI outperforms previous generative ParVI methods such as amortized SVGD, and is competitive with ParVI as well as gold-standard approaches like Hamiltonian Monte Carlo for fitting both exactly known and intractable target distributions.

[1]  Weng-Keen Wong,et al.  Open Set Learning with Counterfactual Images , 2018, ECCV.

[2]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[3]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[4]  Arthur Gretton,et al.  Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families , 2015, NIPS.

[5]  R. Fletcher Conjugate gradient methods for indefinite systems , 1976 .

[6]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[7]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[8]  Chang Liu,et al.  Understanding and Accelerating Particle-Based Variational Inference , 2018, ICML.

[9]  D. Young Iterative methods for solving partial difference equations of elliptic type , 1954 .

[10]  Richard Zemel,et al.  Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , 2020, ICML.

[11]  Vivek Rathod,et al.  Bayesian dark knowledge , 2015, NIPS.

[12]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[13]  Guang Cheng,et al.  Stein Neural Sampler , 2018, ArXiv.

[14]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[15]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[16]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[17]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[19]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[20]  Byron Boots,et al.  Variational Inference for Gaussian Process Models with Linear Complexity , 2017, NIPS.

[21]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[22]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[23]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[24]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[25]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[26]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[27]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[28]  É. Moulines,et al.  On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[29]  Soumya Ghosh,et al.  Quality of Uncertainty Quantification for Bayesian Neural Network Inference , 2019, ArXiv.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[32]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[33]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[34]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[35]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[36]  Ben Glocker,et al.  Implicit Weight Uncertainty in Neural Networks. , 2017 .

[37]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[38]  Y. Marzouk,et al.  An introduction to sampling via measure transport , 2016, 1602.05023.

[39]  Stephen Tyree,et al.  Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.

[40]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.