Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation

Fully convolutional neural networks (FCNs), and in particular U-Nets, have achieved state-of-the-art results in semantic segmentation for numerous medical imaging applications. Moreover, batch normalization and Dice loss have been used successfully to stabilize and accelerate training. However, these networks are poorly calibrated i.e. they tend to produce overconfident predictions for both correct and erroneous classifications, making them unreliable and hard to interpret. In this paper, we study predictive uncertainty estimation in FCNs for medical image segmentation. We make the following contributions: 1) We systematically compare cross-entropy loss with Dice loss in terms of segmentation quality and uncertainty estimation of FCNs; 2) We propose model ensembling for confidence calibration of the FCNs trained with batch normalization and Dice loss; 3) We assess the ability of calibrated FCNs to predict segmentation quality of structures and detect out-of-distribution test examples. We conduct extensive experiments across three medical image segmentation applications of the brain, the heart, and the prostate to evaluate our contributions. The results of this study offer considerable insight into the predictive uncertainty estimation and out-of-distribution detection in medical image segmentation and provide practical recipes for confidence calibration. Moreover, we consistently demonstrate that model ensembling improves confidence calibration.

[1]  Joseph Keshet,et al.  Out-of-Distribution Detection using Multiple Semantic Label Representations , 2018, NeurIPS.

[2]  Matthias Rottmann,et al.  Uncertainty Measures and Prediction Quality Rating for the Semantic Segmentation of Nested Multi Resolution Street Scene Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Nico Karssemeijer,et al.  Transfer Learning for Domain Adaptation in MRI: Application in Brain Lesion Segmentation , 2017, MICCAI.

[4]  Dwarikanath Mahapatra,et al.  Joint Segmentation and Uncertainty Visualization of Retinal Layers in Optical Coherence Tomography Images Using Bayesian Deep Learning , 2018, COMPAY/OMIA@MICCAI.

[5]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Sébastien Ourselin,et al.  Generalised Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks , 2017, BrainLes@MICCAI.

[7]  H. Rolf Jäger,et al.  Let's agree to disagree: learning highly debatable multirater labelling , 2019, MICCAI.

[8]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10]  Mauricio Reyes,et al.  Assessing Reliability and Challenges of Uncertainty Estimations for Medical Image Segmentation , 2019, MICCAI.

[11]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[12]  Eunho Yang,et al.  Uncertainty-Aware Attention for Reliable Interpretation and Prediction , 2018, NeurIPS.

[13]  Klaus H. Maier-Hein,et al.  A Probabilistic U-Net for Segmentation of Ambiguous Images , 2018, NeurIPS.

[14]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[15]  Purang Abolmaesumi,et al.  Automatic high resolution segmentation of the prostate from multi-planar MRI , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[16]  Christos Davatzikos,et al.  Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features , 2017, Scientific Data.

[17]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[18]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.

[19]  Kilian M. Pohl,et al.  Active Mean Fields for Probabilistic Image Segmentation: Connections with Chan-Vese and Rudin-Osher-Fatemi Models , 2015, SIAM J. Imaging Sci..

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[23]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[26]  Graham W. Taylor,et al.  Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[27]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[28]  Max A. Viergever,et al.  Automatic Segmentation and Disease Classification Using Cardiac Cine MR Images , 2017, STACOM@MICCAI.

[29]  Nico Karssemeijer,et al.  Computer-Aided Detection of Prostate Cancer in MRI , 2014, IEEE Transactions on Medical Imaging.

[30]  Georg Heinze,et al.  Medical Biostatistics I , 2010 .

[31]  Jelmer M. Wolterink,et al.  Towards increased trustworthiness of deep learning segmentation methods on cardiac MRI , 2018, Medical Imaging: Image Processing.

[32]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[33]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[34]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[35]  Paul Suetens,et al.  Optimization with soft Dice can lead to a volumetric bias , 2019, BrainLes@MICCAI.

[36]  Purang Abolmaesumi,et al.  On Modelling Label Uncertainty in Deep Neural Networks: Automatic Estimation of Intra- Observer Variability in 2D Echocardiography Quality Assessment , 2019, IEEE Transactions on Medical Imaging.

[37]  Konstantinos Kamnitsas,et al.  Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation , 2017, BrainLes@MICCAI.

[38]  Swami Sankaranarayanan,et al.  Learning From Noisy Labels by Regularized Estimation of Annotator Confusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Sébastien Ourselin,et al.  Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations , 2017, DLMIA/ML-CDS@MICCAI.

[40]  Simon Andermatt,et al.  Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge , 2019, IEEE Transactions on Medical Imaging.

[41]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[42]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Purang Abolmaesumi,et al.  Accurate and robust deep learning-based segmentation of the prostate clinical target volume in ultrasound images , 2019, Medical Image Anal..

[44]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[45]  Myunghee Cho Paik,et al.  Uncertainty quantification using Bayesian neural networks in classification: Application to ischemic stroke lesion segmentation , 2018 .

[46]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[47]  Kevin Smith,et al.  Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.

[48]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[49]  Kemal Tuncali,et al.  Automatic Needle Segmentation and Localization in MRI With 3-D Convolutional Neural Networks: Application to MRI-Targeted Prostate Biopsy , 2019, IEEE Transactions on Medical Imaging.

[50]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[51]  Henkjan Huisman,et al.  Supervised Uncertainty Quantification for Segmentation with Multiple Annotations , 2019, MICCAI.

[52]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[53]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[54]  Florian Jung,et al.  Evaluation of prostate segmentation algorithms for MRI: The PROMISE12 challenge , 2014, Medical Image Anal..

[55]  Ender Konukoglu,et al.  PHiSeg: Capturing Uncertainty in Medical Image Segmentation , 2019, MICCAI.

[56]  Bohyung Han,et al.  Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.