论文信息 - Generative Particle Variational Inference via Estimation of Functional Gradients - 字舞流文

Generative Particle Variational Inference via Estimation of Functional Gradients

Recently, particle-based variational inference (ParVI) methods have gained interest because they directly minimize the Kullback-Leibler divergence and do not suffer from approximation errors from the evidence-based lower bound. However, many ParVI approaches do not allow arbitrary sampling from the posterior, and the few that do allow such sampling suffer from suboptimality. This work proposes a new method for learning to approximately sample from the posterior distribution. We construct a neural sampler that is trained with the functional gradient of the KL-divergence between the empirical sampling distribution and the target distribution, assuming the gradient resides within a reproducing kernel Hilbert space. Our generative ParVI (GPVI) approach maintains the asymptotic performance of ParVI methods while offering the flexibility of a generative sampler. Through carefully constructed experiments, we show that GPVI outperforms previous generative ParVI methods such as amortized SVGD, and is competitive with ParVI as well as gold-standard approaches like Hamiltonian Monte Carlo for fitting both exactly known and intractable target distributions.

Wei Xu | Fuxin Li | Qinxun Bai | Neale Ratzlaff | Fuxin Li | W. Xu | Qinxun Bai | Neale Ratzlaff

[1] Weng-Keen Wong,et al. Open Set Learning with Counterfactual Images , 2018, ECCV.

[2] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[3] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[4] Arthur Gretton,et al. Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families , 2015, NIPS.

[5] R. Fletcher. Conjugate gradient methods for indefinite systems , 1976 .

[6] Qiang Liu,et al. Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[7] Andrew Gordon Wilson,et al. Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[8] Chang Liu,et al. Understanding and Accelerating Particle-Based Variational Inference , 2018, ICML.

[9] D. Young. Iterative methods for solving partial difference equations of elliptic type , 1954 .

[10] Richard Zemel,et al. Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , 2020, ICML.

[11] Vivek Rathod,et al. Bayesian dark knowledge , 2015, NIPS.

[12] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[13] Guang Cheng,et al. Stein Neural Sampler , 2018, ArXiv.

[14] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[15] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[16] James Hensman,et al. Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[17] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[19] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[20] Byron Boots,et al. Variational Inference for Gaussian Process Models with Linear Complexity , 2017, NIPS.

[21] Max Welling,et al. Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[22] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[23] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[24] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[25] Andrew Gelman,et al. Handbook of Markov Chain Monte Carlo , 2011 .

[26] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.

[27] Max Welling,et al. Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[28] É. Moulines,et al. On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[29] Soumya Ghosh,et al. Quality of Uncertainty Quantification for Bayesian Neural Network Inference , 2019, ArXiv.

[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[32] Masashi Sugiyama,et al. Bayesian Dark Knowledge , 2015 .

[33] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[34] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .

[35] Tom Minka,et al. A family of algorithms for approximate Bayesian inference , 2001 .

[36] Ben Glocker,et al. Implicit Weight Uncertainty in Neural Networks. , 2017 .

[37] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[38] Y. Marzouk,et al. An introduction to sampling via measure transport , 2016, 1602.05023.

[39] Stephen Tyree,et al. Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.

[40] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.