Local Temperature Scaling for Probability Calibration

For semantic segmentation, label probabilities are often uncalibrated as they are typically only the by-product of a segmentation task. Intersection over Union (IoU) and Dice score are often used as criteria for segmentation success, while metrics related to label probabilities are rarely explored. On the other hand, probability calibration approaches have been studied, which aim at matching probability outputs with experimentally observed errors, but they mainly focus on classification tasks, not on semantic segmentation. Thus, we propose a learning-based calibration method that focuses on multi-label semantic segmentation. Specifically, we adopt a tree-like convolution neural network to predict local temperature values for probability calibration. One advantage of our approach is that it does not change prediction accuracy, hence allowing for calibration as a post-processing step. Experiments on the COCO and LPBA40 datasets demonstrate improved calibration performance over different metrics. We also demonstrate the performance of our method for multi-atlas brain segmentation from magnetic resonance images.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[3]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[4]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[5]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[6]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[7]  Rudolph Triebel,et al.  Non-Parametric Calibration for Classification , 2019, AISTATS.

[8]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[9]  D. Louis Collins,et al.  Patch-based segmentation using expert priors: Application to hippocampus and ventricle segmentation , 2011, NeuroImage.

[10]  Paul A. Yushkevich,et al.  Improving Multi-atlas Segmentation by Convolutional Neural Network Based Patch Error Estimation , 2019, MICCAI.

[11]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xu Han,et al.  VoteNet: A Deep Learning Label Fusion Method for Multi-Atlas Segmentation , 2019, MICCAI.

[14]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[17]  Carlos Ortiz-de-Solorzano,et al.  Combination Strategies in Multi-Atlas Image Segmentation: Application to Brain MR Data , 2009, IEEE Transactions on Medical Imaging.

[18]  Bhavya Kailkhura,et al.  Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning , 2020, ICML.

[19]  Paul A. Yushkevich,et al.  Multi-Atlas Segmentation with Joint Label Fusion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[21]  Maurizio Filippone,et al.  Calibrating Deep Convolutional Gaussian Processes , 2018, AISTATS.

[22]  Daniel Rueckert,et al.  Nonrigid registration using free-form deformations: application to breast MR images , 1999, IEEE Transactions on Medical Imaging.

[23]  Byron Boots,et al.  Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks , 2020, NeurIPS.

[24]  Sébastien Ourselin,et al.  Fast free-form deformation using graphics processing units , 2010, Comput. Methods Programs Biomed..

[25]  Daniel Rueckert,et al.  Automatic anatomical brain MRI segmentation combining label propagation and decision fusion , 2006, NeuroImage.

[26]  Younghak Shin,et al.  Bin-wise Temperature Scaling (BTS): Improvement in Confidence Calibration Performance through Simple Scaling Techniques , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[27]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Geoff Holmes,et al.  Probability Calibration Trees , 2017, ACML.

[29]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[31]  Christian Gagn'e,et al.  Attended Temperature Scaling: A Practical Approach for Calibrating Deep Neural Networks , 2018, 1810.11586.

[32]  Zhuowen Tu,et al.  Generalizing Pooling Functions in CNNs: Mixed, Gated, and Tree , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Lorenzo Rosasco,et al.  Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification , 2018, NeurIPS.

[34]  Meelis Kull,et al.  Non-parametric Bayesian Isotonic Calibration: Fighting Over-Confidence in Binary Classification , 2019, ECML/PKDD.

[35]  Mert R. Sabuncu,et al.  Multi-atlas segmentation of biomedical images: A survey , 2014, Medical Image Anal..

[36]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[37]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[38]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[39]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[40]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[41]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Arthur W. Toga,et al.  Construction of a 3D probabilistic atlas of human cortical structures , 2008, NeuroImage.

[43]  Konstantinos Kamnitsas,et al.  Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation , 2019, MICCAI.

[44]  Marc Niethammer,et al.  Votenet +: An Improved Deep Learning Label Fusion Method for Multi-Atlas Segmentation , 2020, 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI).

[45]  Philip H.S. Torr,et al.  Calibrating Deep Neural Networks using Focal Loss , 2020, NeurIPS.

[46]  Sébastien Ourselin,et al.  Global image registration using a symmetric block-matching approach , 2014, Journal of medical imaging.

[47]  A. H. Murphy,et al.  Reliability of Subjective Probability Forecasts of Precipitation and Temperature , 1977 .

[48]  Sunita Sarawagi,et al.  Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings , 2018, ICML.

[49]  Ethem Alpaydin,et al.  Autoencoder Trees , 2014, ACML.

[50]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[51]  D. Louis Collins,et al.  Symmetric Atlasing and Model Based Segmentation: An Application to the Hippocampus in Older Adults , 2006, MICCAI.

[52]  Peter A. Flach,et al.  Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration , 2019, NeurIPS.

[53]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[54]  Suyash P. Awate,et al.  A Bayesian Neural Net to Segment Images with Uncertainty Estimates and Good Calibration , 2019, IPMI.

[55]  Mert R. Sabuncu,et al.  A Generative Model for Image Segmentation Based on Label Fusion , 2010, IEEE Transactions on Medical Imaging.

[56]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[57]  Purang Abolmaesumi,et al.  Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation , 2020, IEEE Transactions on Medical Imaging.

[58]  Christopher Joseph Pal,et al.  Brain tumor segmentation with Deep Neural Networks , 2015, Medical Image Anal..

[59]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[60]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[61]  Mahdi Pakdaman Naeini,et al.  Binary Classifier Calibration Using an Ensemble of Near Isotonic Regression Models , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[62]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[63]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[64]  Peter A. Flach,et al.  Beyond sigmoids: How to obtain well-calibrated probabilities from binary classifiers with beta calibration , 2017 .

[65]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[66]  Roberto Paredes,et al.  Calibration of Deep Probabilistic Models with Decoupled Bayesian Neural Networks , 2019, Neurocomputing.

[67]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.