Post Training Uncertainty Calibration Of Deep Networks For Medical Image Segmentation

Neural networks for automated image segmentation are typically trained to achieve maximum accuracy, while less attention has been given to the calibration of their confidence scores. However, well-calibrated confidence scores provide valuable information towards the user. We investigate several post hoc calibration methods that are straightforward to implement, some of which are novel. They are compared to Monte Carlo (MC) dropout. They are applied to neural networks trained with cross-entropy (CE) and soft Dice (SD) losses on BraTS 2018 and ISLES 2018. Surprisingly, models trained on SD loss are not necessarily less calibrated than those trained on CE loss. In all cases, at least one post hoc method improves the calibration. There is limited consistency across the results, so we can't conclude on one method being superior. In all cases, post hoc calibration is competitive with MC dropout. Although average calibration improves compared to the base model, subject-level variance of the calibration remains similar.

[1]  Purang Abolmaesumi,et al.  Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation , 2020, IEEE Transactions on Medical Imaging.

[2]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[3]  Mauricio Reyes,et al.  Analyzing the Quality and Challenges of Uncertainty Estimations for Brain Tumor Segmentation , 2020, Frontiers in Neuroscience.

[4]  Matthew B. Blaschko,et al.  Optimization for Medical Image Segmentation: Theory and Practice When Evaluating With Dice Score or Jaccard Index , 2020, IEEE Transactions on Medical Imaging.

[5]  Paul Suetens,et al.  Theoretical analysis and experimental validation of volume bias of soft Dice optimized segmentation maps in the context of inherent uncertainty , 2020, Medical Image Anal..

[6]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[7]  Purang Abolmaesumi,et al.  Accurate and robust deep learning-based segmentation of the prostate clinical target volume in ultrasound images , 2019, Medical Image Anal..

[8]  et al.,et al.  ISLES 2015 ‐ A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI , 2017, Medical Image Anal..

[9]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[10]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[11]  Graham W. Taylor,et al.  Leveraging Uncertainty Estimates for Predicting Segmentation Quality , 2018, ArXiv.

[12]  et al.,et al.  Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge , 2018, ArXiv.

[13]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[14]  Jelmer M. Wolterink,et al.  Towards increased trustworthiness of deep learning segmentation methods on cardiac MRI , 2018, Medical Imaging: Image Processing.

[15]  Christos Davatzikos,et al.  Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features , 2017, Scientific Data.

[16]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[17]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[18]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.