Fast Uncertainty Estimates and Bayesian Model Averaging of DNNs

Reliable uncertainty estimates for both the weights and the predictions of deep learning models have proven hard to come by due to the complexity and size of the models used. In many other areas, Bayesian methods are used to capture uncertainty estimates and incorporate prior knowledge into the modelling process; however, they often suffer from scalability issues when applied to deep learning models. We extend the recently developed stochastic weight averaging (SWA) procedure in a simple and computationally efficient manner, creating Gaussian approximations to the true posterior distribution. This procedure, termed SWA-Gaussian (SWAG), produces reliable uncertainty estimates, while maintaining accuracy in Bayesian model averaging. Code is available at https://github. com/wjmaddox/swa_uncertainties.

[1]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[2]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[3]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[4]  Peter W. Glynn,et al.  Stochastic Simulation: Algorithms and Analysis , 2007 .

[5]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[6]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[7]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Xin T. Tong,et al.  Statistical inference for model parameters in stochastic gradient descent , 2016, The Annals of Statistics.

[10]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[11]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[12]  David M. Blei,et al.  Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..

[13]  Andrew Gordon Wilson,et al.  Improving Consistency-Based Semi-Supervised Learning with Weight Averaging , 2018, ArXiv.

[14]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[15]  Mohak Shah,et al.  Make (Nearly) Every Neural Network Better: Generating Neural Network Ensembles by Weight Parameter Resampling , 2018, ArXiv.

[16]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[17]  Jasper Snoek,et al.  Stochastic Gradient Langevin dynamics that Exploit Neural Network Structure , 2018, ICLR.

[18]  Benjamin Recht,et al.  Do CIFAR-10 Classifiers Generalize to CIFAR-10? , 2018, ArXiv.