Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration

Class probabilities predicted by most multiclass classifiers are uncalibrated, often tending towards over-confidence. With neural networks, calibration can be improved by temperature scaling, a method to learn a single corrective multiplicative factor for inputs to the last softmax layer. On non-neural models the existing methods apply binary calibration in a pairwise or one-vs-rest fashion. We propose a natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification. It is easily implemented with neural nets since it is equivalent to log-transforming the uncalibrated probabilities, followed by one linear layer and softmax. Experiments demonstrate improved probabilistic predictions according to multiple measures (confidence-ECE, classwise-ECE, log-loss, Brier score) across a wide range of datasets and classifiers. Parameters of the learned Dirichlet calibration map provide insights to the biases in the uncalibrated model.

[1]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[2]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[3]  João Gama,et al.  Kull, M., & Flach, P. A. (2015). Novel Decompositions of Proper Scoring Rules for Classification: Score Adjustment as Precursor to Calibration , 2015 .

[4]  Meelis Kull,et al.  Non-parametric Bayesian Isotonic Calibration: Fighting Over-Confidence in Binary Classification , 2019, ECML/PKDD.

[5]  Jacob Roll,et al.  Evaluating model calibration in classification , 2019, AISTATS.

[6]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[7]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tengyu Ma,et al.  Verified Uncertainty Calibration , 2019, NeurIPS.

[10]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[11]  Peter A. Flach,et al.  Improving the AUC of Probabilistic Estimation Trees , 2003, ECML.

[12]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[13]  Stefano Ermon,et al.  Accurate Uncertainties for Deep Learning Using Calibrated Regression , 2018, ICML.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Peter A. Flach,et al.  Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers , 2017, AISTATS.

[16]  Lorenzo Rosasco,et al.  Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification , 2018, NeurIPS.

[17]  Mahdi Pakdaman Naeini,et al.  Binary Classifier Calibration Using an Ensemble of Near Isotonic Regression Models , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[18]  A. H. Murphy,et al.  Reliability of Subjective Probability Forecasts of Precipitation and Temperature , 1977 .

[19]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[22]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[23]  Peter A. Flach,et al.  Beyond sigmoids: How to obtain well-calibrated probabilities from binary classifiers with beta calibration , 2017 .

[24]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[25]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[26]  Sunita Sarawagi,et al.  Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings , 2018, ICML.