Boosting Uncertainty Estimation for Deep Neural Classifiers

We consider the problem of uncertainty estimation in the context of (non-Bayesian) deep neural classification. All current methods are based on extracting uncertainty signals from a trained network optimized to solve the classification problem at hand. We demonstrate that such techniques tend to misestimate instances whose predictions are supposed to be highly confident. This deficiency is an artifact of the training process with SGD-like optimizers. Based on this observation, we develop an uncertainty estimation algorithm that "peels away" highly confident points sequentially and estimates their confidence using earlier snapshots of the trained model, before their uncertainty estimates are jittered. We present extensive experiments indicating that the proposed algorithm provides uncertainty estimates that are consistently better than the best known methods.

[1]  Ran El-Yaniv,et al.  The Relationship Between Agnostic Selective Classification Active Learning and the Disagreement Coefficient , 2017, J. Mach. Learn. Res..

[2]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[3]  Ran El-Yaniv,et al.  Active Learning via Perfect Selective Classification , 2012, J. Mach. Learn. Res..

[4]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[5]  Ran El-Yaniv,et al.  Selective Classification for Deep Neural Networks , 2017, NIPS.

[6]  Ran El-Yaniv,et al.  Deep Active Learning over the Long Tail , 2017, ArXiv.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Mario Vento,et al.  To reject or not to reject: that is the question-an answer in case of neural classifiers , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[11]  Ran El-Yaniv,et al.  Agnostic Pointwise-Competitive Selective Classification , 2015, J. Artif. Intell. Res..

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[14]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Fabio Roli,et al.  Support Vector Machines with Embedded Reject Option , 2002, SVM.

[16]  Mario Vento,et al.  A method for improving classification reliability of multilayer perceptrons , 1995, IEEE Trans. Neural Networks.

[17]  Ran El-Yaniv,et al.  On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..

[18]  Daphna Weinshall,et al.  Distance-based Confidence Score for Neural Network Classifiers , 2017, ArXiv.

[19]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[20]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[21]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[22]  Weihong Deng,et al.  Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[23]  Ran El-Yaniv,et al.  Agnostic Selective Classification , 2011, NIPS.

[24]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[25]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[26]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.