Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation

Fully convolutional neural networks (FCNs), and in particular U-Nets, have achieved state-of-the-art results in semantic segmentation for numerous medical imaging applications. Moreover, batch normalization and Dice loss have been used successfully to stabilize and accelerate training. However, these networks are poorly calibrated i.e. they tend to produce overconfident predictions for both correct and erroneous classifications, making them unreliable and hard to interpret. In this paper, we study predictive uncertainty estimation in FCNs for medical image segmentation. We make the following contributions: 1) We systematically compare cross-entropy loss with Dice loss in terms of segmentation quality and uncertainty estimation of FCNs; 2) We propose model ensembling for confidence calibration of the FCNs trained with batch normalization and Dice loss; 3) We assess the ability of calibrated FCNs to predict segmentation quality of structures and detect out-of-distribution test examples. We conduct extensive experiments across three medical image segmentation applications of the brain, the heart, and the prostate to evaluate our contributions. The results of this study offer considerable insight into the predictive uncertainty estimation and out-of-distribution detection in medical image segmentation and provide practical recipes for confidence calibration. Moreover, we consistently demonstrate that model ensembling improves confidence calibration.

[1]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[3]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Joseph Keshet,et al.  Out-of-Distribution Detection using Multiple Semantic Label Representations , 2018, NeurIPS.

[8]  Nico Karssemeijer,et al.  Transfer Learning for Domain Adaptation in MRI: Application in Brain Lesion Segmentation , 2017, MICCAI.

[9]  Ender Konukoglu,et al.  PHiSeg: Capturing Uncertainty in Medical Image Segmentation , 2019, MICCAI.

[10]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[11]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.

[12]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[13]  Jelmer M. Wolterink,et al.  Towards increased trustworthiness of deep learning segmentation methods on cardiac MRI , 2018, Medical Imaging: Image Processing.

[14]  Swami Sankaranarayanan,et al.  Learning From Noisy Labels by Regularized Estimation of Annotator Confusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[16]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[17]  Nico Karssemeijer,et al.  Computer-Aided Detection of Prostate Cancer in MRI , 2014, IEEE Transactions on Medical Imaging.

[18]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[19]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[20]  Bohyung Han,et al.  Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Myunghee Cho Paik,et al.  Uncertainty quantification using Bayesian neural networks in classification: Application to ischemic stroke lesion segmentation , 2018 .

[22]  Georg Heinze,et al.  Medical Biostatistics I , 2010 .

[23]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[24]  Klaus H. Maier-Hein,et al.  A Probabilistic U-Net for Segmentation of Ambiguous Images , 2018, NeurIPS.

[25]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[26]  H. Rolf Jäger,et al.  Let's agree to disagree: learning highly debatable multirater labelling , 2019, MICCAI.

[27]  Paul Suetens,et al.  Optimization with soft Dice can lead to a volumetric bias , 2019, BrainLes@MICCAI.

[28]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[29]  Eunho Yang,et al.  Uncertainty-Aware Attention for Reliable Interpretation and Prediction , 2018, NeurIPS.

[30]  Graham W. Taylor,et al.  Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[31]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[32]  Florian Jung,et al.  Evaluation of prostate segmentation algorithms for MRI: The PROMISE12 challenge , 2014, Medical Image Anal..

[33]  Mauricio Reyes,et al.  Assessing Reliability and Challenges of Uncertainty Estimations for Medical Image Segmentation , 2019, MICCAI.

[34]  Purang Abolmaesumi,et al.  Automatic high resolution segmentation of the prostate from multi-planar MRI , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[35]  Max A. Viergever,et al.  Automatic Segmentation and Disease Classification Using Cardiac Cine MR Images , 2017, STACOM@MICCAI.

[36]  Kevin Smith,et al.  Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.

[37]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[38]  Sébastien Ourselin,et al.  Generalised Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks , 2017, BrainLes@MICCAI.

[39]  Dwarikanath Mahapatra,et al.  Joint Segmentation and Uncertainty Visualization of Retinal Layers in Optical Coherence Tomography Images Using Bayesian Deep Learning , 2018, COMPAY/OMIA@MICCAI.

[40]  Kilian M. Pohl,et al.  Active Mean Fields for Probabilistic Image Segmentation: Connections with Chan-Vese and Rudin-Osher-Fatemi Models , 2015, SIAM J. Imaging Sci..

[41]  Purang Abolmaesumi,et al.  On Modelling Label Uncertainty in Deep Neural Networks: Automatic Estimation of Intra- Observer Variability in 2D Echocardiography Quality Assessment , 2019, IEEE Transactions on Medical Imaging.

[42]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[43]  Christos Davatzikos,et al.  Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features , 2017, Scientific Data.

[44]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[45]  Konstantinos Kamnitsas,et al.  Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation , 2017, BrainLes@MICCAI.

[46]  Sébastien Ourselin,et al.  Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations , 2017, DLMIA/ML-CDS@MICCAI.

[47]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[48]  Kemal Tuncali,et al.  Automatic Needle Segmentation and Localization in MRI With 3-D Convolutional Neural Networks: Application to MRI-Targeted Prostate Biopsy , 2019, IEEE Transactions on Medical Imaging.

[49]  Matthias Rottmann,et al.  Uncertainty Measures and Prediction Quality Rating for the Semantic Segmentation of Nested Multi Resolution Street Scene Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50]  Purang Abolmaesumi,et al.  Accurate and robust deep learning-based segmentation of the prostate clinical target volume in ultrasound images , 2019, Medical Image Anal..

[51]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[52]  Henkjan Huisman,et al.  Supervised Uncertainty Quantification for Segmentation with Multiple Annotations , 2019, MICCAI.

[53]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[54]  Simon Andermatt,et al.  Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge , 2019, IEEE Transactions on Medical Imaging.

[55]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[56]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.