Stein Variational Gaussian Processes

We show how to use Stein variational gradient descent (SVGD) to carry out inference in Gaussian process (GP) models with non-Gaussian likelihoods and large data volumes. Markov chain Monte Carlo (MCMC) is extremely computationally intensive for these situations, but the parametric assumptions required for efficient variational inference (VI) result in incorrect inference when they encounter the multi-modal posterior distributions that are common for such models. SVGD provides a non-parametric alternative to variational inference which is substantially faster than MCMC but unhindered by parametric assumptions. We prove that for GP models with Lipschitz gradients the SVGD algorithm monotonically decreases the Kullback-Leibler divergence from the sampling distribution to the true posterior. Our method is demonstrated on benchmark problems in both regression and classification, and a real air quality example with 11440 spatiotemporal observations, showing substantial performance improvements over MCMC and VI.

[1]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[2]  Yang Liu,et al.  Stein Variational Policy Gradient , 2017, UAI.

[3]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  Ding-Xuan Zhou Derivative reproducing properties for kernel methods in learning theory , 2008 .

[6]  Karl Ropkins,et al.  openair - An R package for air quality data analysis , 2012, Environ. Model. Softw..

[7]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[8]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[9]  Jian Peng,et al.  Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization , 2019, ICML.

[10]  J. Gillis,et al.  Probability and Related Topics in Physical Sciences , 1960 .

[11]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[12]  Juan José Murillo-Fuentes,et al.  Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo , 2018, NeurIPS.

[13]  Mark F. J. Steel,et al.  Non-Gaussian Bayesian Geostatistical Modeling , 2006 .

[14]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[15]  D. Rudoy,et al.  Monte Carlo Methods for Multi-Modal Distributions , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[16]  Ryan P. Adams,et al.  Elliptical slice sampling , 2009, AISTATS.

[17]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[19]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[20]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[21]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[22]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[23]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  A. Duncan,et al.  On the geometry of Stein variational gradient descent , 2019, ArXiv.

[26]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[27]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[28]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[29]  Zhe Gan,et al.  VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.

[30]  Richard Zemel,et al.  Cutting out the Middle-Man: Training and Evaluating Energy-Based Models without Sampling , 2020, ICML 2020.

[31]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[32]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[33]  Kenji Fukumizu,et al.  A Kernel Stein Test for Comparing Latent Variable Models , 2019, Journal of the Royal Statistical Society Series B: Statistical Methodology.

[34]  Byron Boots,et al.  Variational Inference for Gaussian Process Models with Linear Complexity , 2017, NIPS.

[35]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[36]  Edwin V. Bonilla,et al.  Automated Variational Inference for Gaussian Process Models , 2014, NIPS.

[37]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[39]  Kenji Fukumizu,et al.  The equivalence between Stein variational gradient descent and black-box variational inference , 2020, ICLR 2020.

[40]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..