Expressive yet Tractable Bayesian Deep Learning via Subnetwork Inference

The Bayesian paradigm has the potential to solve some of the core issues in modern deep learning, such as poor calibration, data inefficiency, and catastrophic forgetting. However, scaling Bayesian inference to the high-dimensional parameter spaces of deep neural networks requires restrictive approximations. In this paper, we propose performing inference over only a small subset of the model parameters while keeping all others as point estimates. This enables us to use expressive posterior approximations that would otherwise be intractable for the full model. In particular, we develop a practical and scalable Bayesian deep learning method that first trains a point estimate, and then infers a full covariance Gaussian posterior approximation over a subnetwork. We propose a subnetwork selection procedure which aims to optimally preserve posterior uncertainty. We empirically demonstrate the effectiveness of our approach compared to point-estimated networks and methods that use less expressive posterior approximations over the full network.

[1]  Alexander G. de G. Matthews,et al.  Scalable Gaussian process inference using variational methods , 2017 .

[2]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[3]  José Miguel Hernández-Lobato,et al.  Bayesian Batch Active Learning as Sparse Subset Approximation , 2019, NeurIPS.

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  Mohammad Emtiyaz Khan,et al.  Approximate Inference Turns Deep Networks into Gaussian Processes , 2019, NeurIPS.

[6]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[7]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[8]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[9]  Sebastian W. Ober,et al.  Benchmarking the Neural Linear Model for Regression , 2019, ArXiv.

[10]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.

[12]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[13]  Aidan N. Gomez,et al.  Benchmarking Bayesian Deep Learning with Diabetic Retinopathy Diagnosis , 2019 .

[14]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[15]  Jasper Snoek,et al.  Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors , 2020, ICML.

[16]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[17]  Alex Fridman,et al.  Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[19]  Andrew Gordon Wilson,et al.  Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited , 2020, ArXiv.

[20]  Balaji Lakshminarayanan,et al.  Deep Ensembles: A Loss Landscape Perspective , 2019, ArXiv.

[21]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[22]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[23]  Didrik Nielsen,et al.  Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam , 2018, ICML.

[24]  Aaron Mishkin,et al.  SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient , 2018, NeurIPS.

[25]  Richard E. Turner,et al.  'In-Between' Uncertainty in Bayesian Neural Networks , 2019, ArXiv.

[26]  Laurence Aitchison,et al.  Deep Convolutional Networks as shallow Gaussian Processes , 2018, ICLR.

[27]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[28]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[29]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[30]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[31]  Andrew Gordon Wilson,et al.  Subspace Inference for Bayesian Deep Learning , 2019, UAI.

[32]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[33]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[36]  Agustinus Kristiadi,et al.  Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks , 2020, ICML.

[37]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[38]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[39]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[40]  Dmitry Vetrov,et al.  Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning , 2020, ICLR.

[41]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[42]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[43]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[44]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[45]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[46]  James Martens Second-order Optimization for Neural Networks , 2016 .

[47]  Alexander Immer,et al.  Improving predictions of Bayesian neural networks via local linearization , 2020, ArXiv.

[48]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[49]  Maurizio Filippone,et al.  Walsh-Hadamard Variational Inference for Bayesian Deep Learning , 2019, NeurIPS.

[50]  Michael Betancourt,et al.  The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling , 2015, ICML.

[51]  Guodong Zhang,et al.  Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.

[52]  Jasper Snoek,et al.  The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks , 2020, ICML.

[53]  Mohammad Emtiyaz Khan,et al.  Practical Deep Learning with Bayesian Principles , 2019, NeurIPS.

[54]  Richard E. Turner,et al.  Pathologies of Factorised Gaussian and MC Dropout Posteriors in Bayesian Neural Networks , 2019, ArXiv.