Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations

We challenge the longstanding assumption that the mean-field approximation for variational inference in Bayesian neural networks is severely restrictive, and show this is not the case in deep networks. We prove several results indicating that deep mean-field variational weight posteriors can induce similar distributions in function-space to those induced by shallower networks with complex weight posteriors. We validate our theoretical contributions empirically, both through examination of the weight posterior using Hamiltonian Monte Carlo in small models and by comparing diagonal- to structured-covariance in large settings. Since complex variational posteriors are often expensive and cumbersome to implement, our results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.

[1]  Adi Shamir,et al.  A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance , 2019, ArXiv.

[2]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[3]  W. Rudin Principles of mathematical analysis , 1964 .

[4]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[5]  Didrik Nielsen,et al.  Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam , 2018, ICML.

[6]  Kamil Adamczewski,et al.  Radial and Directional Posteriors for Bayesian Neural Networks , 2019, ArXiv.

[7]  Aaron Mishkin,et al.  SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient , 2018, NeurIPS.

[8]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[9]  Guodong Zhang,et al.  Functional Variational Bayesian Neural Networks , 2019, ICLR.

[10]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[11]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[12]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[13]  Stephen J. Roberts,et al.  Introducing an Explicit Symplectic Integration Scheme for Riemannian Manifold Hamiltonian Monte Carlo , 2019, ArXiv.

[14]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[15]  Jasper Snoek,et al.  The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks , 2020, ICML.

[16]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[17]  Yarin Gal,et al.  Radial Bayesian Neural Networks: Robust Variational Inference In Big Models , 2019, ArXiv.

[18]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[19]  Sebastian Nowozin,et al.  Deterministic Variational Inference for Robust Bayesian Neural Networks , 2018, ICLR.

[20]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[21]  M. Springer,et al.  The Distribution of Products of Beta, Gamma and Gaussian Random Variables , 1970 .

[22]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[23]  Richard E. Turner,et al.  Pathologies of Factorised Gaussian and MC Dropout Posteriors in Bayesian Neural Networks , 2019, ArXiv.

[24]  Mohammad Emtiyaz Khan,et al.  Practical Deep Learning with Bayesian Principles , 2019, NeurIPS.

[25]  Jonas Geiping,et al.  Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory , 2020, ICLR.

[26]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[27]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[28]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[29]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[30]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[31]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[32]  Lennart Bondesson,et al.  A Class of Probability Distributions that is Closed with Respect to Addition as Well as Multiplication of Independent Random Variables , 2015 .

[33]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[36]  Dustin Tran,et al.  Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[37]  Guodong Zhang,et al.  Noisy Natural Gradient as Variational Inference , 2017, ICML.

[38]  Michael I. Jordan,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[39]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .