Efficient Evaluation-Time Uncertainty Estimation by Improved Distillation

In this work we aim to obtain computationally-efficient uncertainty estimates with deep networks. For this, we propose a modified knowledge distillation procedure that achieves state-of-the-art uncertainty estimates both for in and out-of-distribution samples. Our contributions include a) demonstrating and adapting to distillation's regularization effect b) proposing a novel target teacher distribution c) a simple augmentation procedure to improve out-of-distribution uncertainty estimates d) shedding light on the distillation procedure through comprehensive set of experiments.

[1]  Geoffrey E. Hinton,et al.  Large scale distributed neural network training through online distillation , 2018, ICLR.

[2]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[3]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[4]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[5]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6]  Ran El-Yaniv,et al.  Selective Classification for Deep Neural Networks , 2017, NIPS.

[7]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[8]  Vivek Rathod,et al.  Bayesian dark knowledge , 2015, NIPS.

[9]  Kevin Smith,et al.  Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.

[10]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[11]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[12]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[13]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[14]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[15]  José Miguel Hernández-Lobato,et al.  Meta-Learning for Stochastic Gradient MCMC , 2018, ICLR.

[16]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[17]  Sebastian Nowozin,et al.  Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks , 2018, ArXiv.

[18]  Alex Bewley,et al.  Dropout Distillation for Efficiently Estimating Model Confidence , 2018, ArXiv.

[19]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.