Calibration of Neural Networks using Splines

Calibrating neural networks is of utmost importance when employing them in safety-critical applications where the downstream decision making depends on the predicted probabilities. Measuring calibration error amounts to comparing two empirical distributions. In this work, we introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test in which the main idea is to compare the respective cumulative probability distributions. From this, by approximating the empirical cumulative distribution using a differentiable function via splines, we obtain a recalibration function, which maps the network outputs to actual (calibrated) class assignment probabilities. The spine-fitting is performed using a held-out calibration set and the obtained recalibration function is evaluated on an unseen test set. We tested our method against existing calibration approaches on various image classification datasets and our spline-based recalibration approach consistently outperforms existing methods on KS error as well as other commonly used calibration measures.

[1]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[2]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[3]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[4]  B. Kvasov Cubic Spline Interpolation , 2000 .

[5]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Philip H.S. Torr,et al.  Calibrating Deep Neural Networks using Focal Loss , 2020, NeurIPS.

[7]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[8]  Jacob Roll,et al.  Evaluating model calibration in classification , 2019, AISTATS.

[9]  Bhavya Kailkhura,et al.  Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning , 2020, ICML.

[10]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[11]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[14]  Peter A. Flach,et al.  Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers , 2017, AISTATS.

[15]  김정민,et al.  Cubic Spline Interpolation을 이용한 얼굴 영상의 단순화 , 2010 .

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Gopinath Chennupati,et al.  On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks , 2019, NeurIPS.

[18]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[19]  Bohyung Han,et al.  Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[21]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[22]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[23]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Tengyu Ma,et al.  Verified Uncertainty Calibration , 2019, NeurIPS.

[25]  Fredrik Lindsten,et al.  Calibration tests in multi-class classification: A unifying framework , 2019, NeurIPS.

[26]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[29]  Peter A. Flach,et al.  Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration , 2019, NeurIPS.

[30]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[31]  AN Kolmogorov-Smirnov,et al.  Sulla determinazione empírica di uma legge di distribuzione , 1933 .

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Sunita Sarawagi,et al.  Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings , 2018, ICML.