Meta-Cal: Well-controlled Post-hoc Calibration by Ranking

In many applications, it is desirable that a classifier not only makes accurate predictions, but also outputs calibrated probabilities. However, many existing classifiers, especially deep neural network classifiers, tend not to be calibrated. Post-hoc calibration is a technique to recalibrate a model, and its goal is to learn a calibration map. Existing approaches mostly focus on constructing calibration maps with low calibration errors. Contrary to these methods, we study post-hoc calibration for multi-class classification under constraints, as a calibrator with a low calibration error does not necessarily mean it is useful in practice. In this paper, we introduce two practical constraints to be taken into consideration. We then present Meta-Cal, which is built from a base calibrator and a ranking model. Under some mild assumptions, two high-probability bounds are given with respect to these constraints. Empirical results on CIFAR-10, CIFAR-100 and ImageNet and a range of popular network architectures show our proposed method significantly outperforms the current state of the art for post-hoc multi-class classification calibration.1

[1]  Jingyi Jessica Li,et al.  Neyman-Pearson classification algorithms and NP receiver operating characteristics , 2016, Science Advances.

[2]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[3]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[4]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[5]  Tengyu Ma,et al.  Verified Uncertainty Calibration , 2019, NeurIPS.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Marc Niethammer,et al.  Local Temperature Scaling for Probability Calibration , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[10]  Yunhong Wang,et al.  Cost-Sensitive Two-Stage Depression Prediction Using Dynamic Visual Clues , 2016, ACCV.

[11]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[12]  Peter A. Flach,et al.  Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration , 2019, NeurIPS.

[13]  Bhavya Kailkhura,et al.  Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning , 2020, ICML.

[14]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[15]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[17]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[18]  Lorenzo Rosasco,et al.  Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification , 2018, NeurIPS.

[19]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[20]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[21]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[22]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[23]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[24]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[25]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[26]  Rudolph Triebel,et al.  Non-Parametric Calibration for Classification , 2019, AISTATS.

[27]  Peter A. Flach,et al.  Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers , 2017, AISTATS.

[28]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Michael Pfeiffer,et al.  Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning , 2021, ICLR.

[32]  Ran El-Yaniv,et al.  Selective Classification for Deep Neural Networks , 2017, NIPS.

[33]  Harikrishna Narasimhan,et al.  On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation , 2013, NIPS.

[34]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[35]  Zhi-Hua Zhou,et al.  On the Consistency of AUC Pairwise Optimization , 2012, IJCAI.