Practical Bayesian Learning of Neural Networks via Adaptive Optimisation Methods

We introduce a novel framework for the estimation of the posterior distribution over the weights of a neural network, based on a new probabilistic interpretation of adaptive optimisation algorithms such as AdaGrad and Adam. We demonstrate the effectiveness of our Bayesian Adam method, Badam, by experimentally showing that the learnt uncertainties correctly relate to the weights' predictive capabilities by weight pruning. We also demonstrate the quality of the derived uncertainty measures by comparing the performance of Badam to standard methods in a Thompson sampling setting for multi-armed bandits, where good uncertainty measures are required for an agent to balance exploration and exploitation.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[3]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[4]  Mohammad Emtiyaz Khan,et al.  Conjugate-Computation Variational Inference: Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models , 2017, AISTATS.

[5]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[6]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[7]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[8]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[9]  Guodong Zhang,et al.  Noisy Natural Gradient as Variational Inference , 2017, ICML.

[10]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[11]  Didrik Nielsen,et al.  Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam , 2018, ICML.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[14]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[15]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[16]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[17]  Shakir Mohamed,et al.  Implicit Reparameterization Gradients , 2018, NeurIPS.

[18]  James Martens,et al.  New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..