On Signal-to-Noise Ratio Issues in Variational Inference for Deep Gaussian Processes

We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues. Specifically, we show both theoretically and empirically that the SNR of the gradient estimates for the latent variable's variational parameters decreases as the number of importance samples increases. As a result, these gradient estimates degrade to pure noise if the number of importance samples is too large. To address this pathology, we show how doubly-reparameterized gradient estimators, originally proposed for training variational autoencoders, can be adapted to the DGP setting and that the resultant estimators completely remedy the SNR issue, thereby providing more reliable training. Finally, we demonstrate that our fix can lead to improvements in the predictive performance of the model's predictive posterior.

[1]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[2]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[3]  George Tucker,et al.  Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives , 2019, ICLR.

[4]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[5]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Jonas Mockus,et al.  Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[8]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[13]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14]  James Hensman,et al.  Deep Gaussian Processes with Importance-Weighted Variational Inference , 2019, ICML.

[15]  Juan José Murillo-Fuentes,et al.  Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo , 2018, NeurIPS.

[16]  Hongseok Yang,et al.  On Nesting Monte Carlo Estimators , 2017, ICML.

[17]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[18]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[19]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .