Bayesian Deep Learning via Subnetwork Inference

The Bayesian paradigm has the potential to solve core issues of deep neural networks such as poor calibration and data inefficiency. Alas, scaling Bayesian inference to large weight spaces often requires restrictive approximations. In this work, we show that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors. The other weights are kept as point estimates. This subnetwork inference framework enables us to use expressive, otherwise intractable, posterior approximations over such subsets. In particular, we implement subnetwork linearized Laplace as a simple, scalable Bayesian deep learning method: We first obtain a MAP estimate of all weights and then infer a fullcovariance Gaussian posterior over a subnetwork using the linearized Laplace approximation. We propose a subnetwork selection strategy that aims to maximally preserve the model’s predictive uncertainty. Empirically, our approach compares favorably to ensembles and less expressive posterior approximations over full networks.

[1]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  James Martens,et al.  New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..

[3]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[4]  Jasper Snoek,et al.  Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors , 2020, ICML.

[5]  Gunnar Ratsch,et al.  Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning , 2021, ICML.

[6]  Aaron Mishkin,et al.  SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient , 2018, NeurIPS.

[7]  Zhidong Deng,et al.  Recent progress in semantic image segmentation , 2018, Artificial Intelligence Review.

[8]  Padhraic Smyth,et al.  Learning Priors for Invariance , 2018, AISTATS.

[9]  Guodong Zhang,et al.  Functional Variational Bayesian Neural Networks , 2019, ICLR.

[10]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[11]  Balaji Lakshminarayanan,et al.  Deep Ensembles: A Loss Landscape Perspective , 2019, ArXiv.

[12]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[13]  Andrew Gordon Wilson,et al.  Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited , 2020, ArXiv.

[14]  Jasper Snoek,et al.  The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks , 2020, ICML.

[15]  Didrik Nielsen,et al.  Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam , 2018, ICML.

[16]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[17]  Yarin Gal,et al.  Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations , 2020, NeurIPS.

[18]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Alexander Immer,et al.  Improving predictions of Bayesian neural networks via local linearization , 2020, ArXiv.

[21]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Richard E. Turner,et al.  'In-Between' Uncertainty in Bayesian Neural Networks , 2019, ArXiv.

[24]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Maurizio Filippone,et al.  Walsh-Hadamard Variational Inference for Bayesian Deep Learning , 2019, NeurIPS.

[26]  Mohammad Emtiyaz Khan,et al.  Approximate Inference Turns Deep Networks into Gaussian Processes , 2019, NeurIPS.

[27]  Mohammad Emtiyaz Khan,et al.  Practical Deep Learning with Bayesian Principles , 2019, NeurIPS.

[28]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[29]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[30]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[31]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[32]  Andrew Gordon Wilson,et al.  Subspace Inference for Bayesian Deep Learning , 2019, UAI.

[33]  José Miguel Hernández-Lobato,et al.  Bayesian Batch Active Learning as Sparse Subset Approximation , 2019, NeurIPS.

[34]  Jos'e Miguel Hern'andez-Lobato,et al.  Predictive Complexity Priors , 2020, AISTATS.

[35]  Richard E. Turner,et al.  On the Expressiveness of Approximate Inference in Bayesian Neural Networks , 2019, NeurIPS.

[36]  Jos'e Miguel Hern'andez-Lobato,et al.  Depth Uncertainty in Neural Networks , 2020, NeurIPS.

[37]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[38]  Agustinus Kristiadi,et al.  Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning , 2021, ArXiv.

[39]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[40]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[41]  Matthias Hein,et al.  Learnable Uncertainty under Laplace Approximations , 2020, UAI.

[42]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[43]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[44]  Dmitry Vetrov,et al.  Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning , 2020, ICLR.

[45]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[46]  Andrew Gordon Wilson,et al.  Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.

[47]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[48]  Aidan N. Gomez,et al.  Benchmarking Bayesian Deep Learning with Diabetic Retinopathy Diagnosis , 2019 .

[49]  James Martens Second-order Optimization for Neural Networks , 2016 .

[50]  Michael Betancourt,et al.  The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling , 2015, ICML.

[51]  Sebastian W. Ober,et al.  Benchmarking the Neural Linear Model for Regression , 2019, ArXiv.

[52]  Mark van der Wilk,et al.  Understanding Variational Inference in Function-Space , 2020, ArXiv.

[53]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[54]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[55]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[56]  Agustinus Kristiadi,et al.  Laplace Redux - Effortless Bayesian Deep Learning , 2021, ArXiv.

[57]  D. Vetrov,et al.  On Power Laws in Deep Ensembles , 2020, NeurIPS.

[58]  Andrew Gordon Wilson,et al.  What Are Bayesian Neural Network Posteriors Really Like? , 2021, ICML.

[59]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[60]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[61]  Roger B. Grosse,et al.  Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.

[62]  Rumi Chunara,et al.  Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty , 2020, AIES.

[63]  Tim Pearce,et al.  Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions , 2019, UAI.

[64]  Agustinus Kristiadi,et al.  Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks , 2020, ICML.

[65]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[66]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[67]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[68]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[69]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[70]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[71]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[72]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.