Class-Distribution-Aware Calibration for Long-Tailed Visual Recognition

Despite impressive accuracy, deep neural networks are often miscalibrated and tend to overly confident predictions. Recent techniques like temperature scaling (TS) and label smoothing (LS) show effectiveness in obtaining a well-calibrated model by smoothing logits and hard labels with scalar factors, respectively. However, the use of uniform TS or LS factor may not be optimal for calibrating models trained on a long-tailed dataset where the model produces overly confident probabilities for high-frequency classes. In this study, we propose class-distribution-aware TS (CDATS) and LS (CDA-LS) by incorporating class frequency information in model calibration in the context of long-tailed distribution. In CDA-TS, the scalar temperature value is replaced with the CDA temperature vector encoded with class frequency to compensate for the over-confidence. Similarly, CDA-LS uses a vector smoothing factor and flattens the hard labels according to their corresponding class distribution. We also integrate CDA optimal temperature vector with distillation loss, which reduces miscalibration in selfdistillation (SD). We empirically show that classdistribution-aware TS and LS can accommodate the imbalanced data distribution yielding superior performance in both calibration error and predictive accuracy. We also observe that SD with an extremely imbalanced dataset is less effective in terms of calibration performance. Code is available in

[1]  Zhiqiang Shen,et al.  Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study , 2021, ICLR.

[2]  Philip H.S. Torr,et al.  Calibrating Deep Neural Networks using Focal Loss , 2020, NeurIPS.

[3]  Max-Heinrich Laves,et al.  Well-calibrated Model Uncertainty with Temperature Scaling for Dropout Variational Inference , 2019, ArXiv.

[4]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[5]  Hongsheng Li,et al.  Balanced Meta-Softmax for Long-Tailed Visual Recognition , 2020, NeurIPS.

[6]  José Hernández-Orallo,et al.  Similarity-Binning Averaging: A Generalisation of Binning Calibration , 2009, IDEAL.

[7]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[8]  Younghak Shin,et al.  Bin-wise Temperature Scaling (BTS): Improvement in Confidence Calibration Performance through Simple Scaling Techniques , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[9]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[10]  Bin Dong,et al.  Distillation ≈ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network , 2019, ArXiv.

[11]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[12]  Zachary C. Lipton,et al.  What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.

[13]  Marc Niethammer,et al.  Local Temperature Scaling for Probability Calibration , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Amos J. Storkey,et al.  Moonshine: Distilling with Cheap Convolutions , 2017, NeurIPS.

[17]  Dmitry Vetrov,et al.  Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning , 2020, ICLR.

[18]  Christian Gagn'e,et al.  Attended Temperature Scaling: A Practical Approach for Calibrating Deep Neural Networks , 2018, 1810.11586.

[19]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[20]  Chen Huang,et al.  Deep Imbalanced Learning for Face Recognition and Attribute Prediction , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[22]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[23]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[26]  Peter A. Flach,et al.  Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration , 2019, NeurIPS.

[27]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[28]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.