Learnable Uncertainty under Laplace Approximations

Laplace approximations are classic, computationally lightweight means for constructing Bayesian neural networks (BNNs). As in other approximate BNNs, one cannot necessarily expect the induced predictive uncertainty to be calibrated. Here we develop a formalism to explicitly "train" the uncertainty in a decoupled way to the prediction itself. To this end we introduce uncertainty units for Laplace-approximated networks: Hidden units with zero weights that can be added to any pre-trained, point-estimated network. Since these units are inactive, they do not affect the predictions. But their presence changes the geometry (in particular the Hessian) of the loss landscape around the point estimate, thereby affecting the network's uncertainty estimates under a Laplace approximation. We show that such units can be trained via an uncertainty-aware objective, making the Laplace approximation competitive with more expensive alternative uncertainty-quantification frameworks.

[1]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[3]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[4]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[5]  Tom Heskes,et al.  On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.

[6]  Frederik Kunstner,et al.  BackPACK: Packing more into backprop , 2020, ICLR.

[7]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[8]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[9]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[10]  Alexandre Hoang Thiery,et al.  Uncertainty Quantification and Deep Ensembles , 2020, NeurIPS.

[11]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[12]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[13]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[14]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[15]  Andrey Malinin,et al.  Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness , 2019, NeurIPS.

[16]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[17]  Frank Hutter,et al.  A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[18]  Richard E. Turner,et al.  'In-Between' Uncertainty in Bayesian Neural Networks , 2019, ArXiv.

[19]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[20]  Agustinus Kristiadi,et al.  Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks , 2020, ICML.

[21]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[22]  Sebastian W. Ober,et al.  Benchmarking the Neural Linear Model for Regression , 2019, ArXiv.

[23]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[24]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[25]  Michael W. Mahoney,et al.  PyHessian: Neural Networks Through the Lens of the Hessian , 2019, 2020 IEEE International Conference on Big Data (Big Data).

[26]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[29]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[30]  Matthias Hein,et al.  Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[32]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[33]  Matthias Hein,et al.  Certifiably Adversarially Robust Detection of Out-of-Distribution Data , 2020, NeurIPS.

[34]  Alexander Immer,et al.  Improving predictions of Bayesian neural networks via local linearization , 2020, ArXiv.

[35]  James Martens,et al.  New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..

[36]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[37]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[38]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[39]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[40]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[41]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[42]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[43]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[44]  Matthias Hein,et al.  Towards neural networks that provably know when they don't know , 2020, ICLR.