On the Relationship Between Calibrated Predictors and Unbiased Volume Estimation

Machine learning driven medical image segmentation has become standard in medical image analysis. However, deep learning models are prone to overconfident predictions. This has led to a renewed focus on calibrated predictions in the medical imaging and broader machine learning communities. Calibrated predictions are estimates of the probability of a label that correspond to the true expected value of the label conditioned on the confidence. Such calibrated predictions have utility in a range of medical imaging applications, including surgical planning under uncertainty and active learning systems. At the same time it is often an accurate volume measurement that is of real importance for many medical applications. This work investigates the relationship between model calibration and volume estimation. We demonstrate both mathematically and empirically that if the predictor is calibrated per image, we can obtain the correct volume by taking an expectation of the probability scores per pixel/voxel of the image. Furthermore, we show that convex combinations of calibrated classifiers preserve volume estimation, but do not preserve calibration. Therefore, we conclude that having a calibrated predictor is a sufficient, but not necessary condition for obtaining an unbiased estimate of the volume. We validate our theoretical findings empirically on a collection of 18 different (calibrated) training strategies on the tasks of glioma volume estimation on BraTS 2018, and ischemic stroke lesion volume estimation on ISLES 2018 datasets.

[1]  Christos Davatzikos,et al.  Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features , 2017, Scientific Data.

[2]  Hans Hagen,et al.  An Uncertainty-aware Workflow for Keyhole Surgery Planning using Hierarchical Image Semantics , 2018, Vis. Informatics.

[3]  Jian Wu,et al.  Active learning with noise modeling for medical image annotation , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[4]  Klaus H. Maier-Hein,et al.  No New-Net , 2018, 1809.10483.

[5]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[6]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[7]  M. Jorge Cardoso,et al.  Towards safe deep learning: accurately quantifying biomarker uncertainty in neural network predictions , 2018, MICCAI.

[8]  Stefun D. Leigh U-Statistics Theory and Practice , 1992 .

[9]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[10]  Matthew B. Blaschko,et al.  Optimization for Medical Image Segmentation: Theory and Practice When Evaluating With Dice Score or Jaccard Index , 2020, IEEE Transactions on Medical Imaging.

[11]  Paul Suetens,et al.  Theoretical analysis and experimental validation of volume bias of soft Dice optimized segmentation maps in the context of inherent uncertainty , 2020, Medical Image Anal..

[12]  Rudolph Triebel,et al.  Non-Parametric Calibration for Classification , 2019, AISTATS.

[13]  Matthew B. Blaschko,et al.  Post Training Uncertainty Calibration Of Deep Networks For Medical Image Segmentation , 2020, 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).

[14]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[15]  Frederik Maes,et al.  Left Ventricular Parameter Regression from Deep Feature Maps of a Jointly Trained Segmentation CNN , 2019, STACOM@MICCAI.

[16]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[17]  Tengyu Ma,et al.  Verified Uncertainty Calibration , 2019, NeurIPS.

[18]  Mauricio Reyes,et al.  Analyzing the Quality and Challenges of Uncertainty Estimations for Brain Tumor Segmentation , 2020, Frontiers in Neuroscience.

[19]  A. Demchuk,et al.  Endovascular thrombectomy after large-vessel ischaemic stroke: a meta-analysis of individual patient data from five randomised trials , 2016, The Lancet.

[20]  A. L. Bowley The Standard Deviation of the Correlation Coefficient , 1928 .

[21]  Klaus H. Maier-Hein,et al.  A Probabilistic U-Net for Segmentation of Ambiguous Images , 2018, NeurIPS.

[22]  et al.,et al.  Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge , 2018, ArXiv.

[23]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.

[24]  H. Thames,et al.  Tumor volume: a basic and specific response predictor in radiotherapy. , 1998, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[25]  Ender Konukoglu,et al.  PHiSeg: Capturing Uncertainty in Medical Image Segmentation , 2019, MICCAI.

[26]  Andrew L Beers,et al.  ISLES 2016 and 2017-Benchmarking Ischemic Stroke Lesion Outcome Prediction Based on Multispectral MRI , 2018, Front. Neurol..

[27]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[28]  C. Levi,et al.  Evaluation of hyperacute infarct volume using ASPECTS and brain CT perfusion core volume , 2017, Neurology.