Post-hoc loss-calibration for Bayesian neural networks

Bayesian decision theory provides an elegant framework for acting optimally under uncertainty when tractable posterior distributions are available. Modern Bayesian models, however, typically involve intractable posteriors that are approximated with, potentially crude, surrogates. This difficulty has engendered loss-calibrated techniques that aim to learn posterior approximations that favor high-utility decisions. In this paper, focusing on Bayesian neural networks, we develop methods for correcting approximate posterior predictive distributions encouraging them to prefer high-utility decisions. In contrast to previous work, our approach is agnostic to the choice of the approximate inference algorithm, allows for efficient test time decision making through amortization, and empirically produces higher quality decisions. We demonstrate the effectiveness of our approach through controlled experiments spanning a diversity of tasks and datasets.

[1]  Zoubin Ghahramani,et al.  Approximate inference for the loss-calibrated Bayesian , 2011, AISTATS.

[2]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[3]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[4]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016 .

[5]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[6]  J C Rougier,et al.  On the use of Bayesian decision theory for issuing natural hazard warnings. , 2016, Proceedings. Mathematical, physical, and engineering sciences.

[7]  Benjamin M. Marlin,et al.  URSABench: Comprehensive Benchmarking of Approximate Bayesian Inference Methods for Deep Neural Networks , 2020, ArXiv.

[8]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[9]  Pier Giovanni Bissiri,et al.  A general framework for updating belief distributions , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[10]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[11]  Daniel Hernández-Lobato,et al.  Black-Box Alpha Divergence Minimization , 2015, ICML.

[12]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[13]  Tomasz Kusmierczyk,et al.  Variational Bayesian Decision-making for Continuous Utilities , 2019, NeurIPS.

[14]  Soumya Ghosh,et al.  Assumed Density Filtering Methods for Learning Bayesian Neural Networks , 2016, AAAI.

[15]  Tomasz Kusmierczyk,et al.  Correcting Predictions for Approximate Bayesian Inference , 2020, AAAI.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[18]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[19]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[20]  Theodoros Damoulas,et al.  Generalized Variational Inference: Three arguments for deriving new Posteriors , 2019 .

[21]  Dan A. Simovici,et al.  Bayesian Learning , 2019, Variational Bayesian Learning Theory.

[22]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  Andrew Gordon Wilson,et al.  Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.

[25]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[26]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[27]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[28]  Scott Sanner,et al.  Loss-Calibrated Monte Carlo Action Selection , 2015, AAAI.

[29]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[30]  Roberto Cipolla,et al.  Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning , 2017, IJCAI.

[31]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[32]  Richard E. Turner,et al.  Stochastic Expectation Propagation , 2015, NIPS.

[33]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[34]  Benjamin M. Marlin,et al.  Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks , 2020, UAI.

[35]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[36]  Ron Meir,et al.  Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[37]  Soumya Ghosh,et al.  Quality of Uncertainty Quantification for Bayesian Neural Network Inference , 2019, ArXiv.

[38]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[39]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).