Optimization with soft Dice can lead to a volumetric bias

Segmentation is a fundamental task in medical image analysis. The clinical interest is often to measure the volume of a structure. To evaluate and compare segmentation methods, the similarity between a segmentation and a predefined ground truth is measured using metrics such as the Dice score. Recent segmentation methods based on convolutional neural networks use a differentiable surrogate of the Dice score, such as soft Dice, explicitly as the loss function during the learning phase. Even though this approach leads to improved Dice scores, we find that, both theoretically and empirically on four medical tasks, it can introduce a volumetric bias for tasks with high inherent uncertainty. As such, this may limit the method's clinical applicability.

[1]  et al.,et al.  Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge , 2018, ArXiv.

[2]  Klaus H. Maier-Hein,et al.  No New-Net , 2018, 1809.10483.

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  Garth J. Williams,et al.  Corrigendum: Diffraction data of core-shell nanoparticles from an X-ray free electron laser , 2017, Scientific Data.

[5]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[6]  Christos Davatzikos,et al.  Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features , 2017, Scientific Data.

[7]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.

[8]  Andrew L Beers,et al.  ISLES 2016 and 2017-Benchmarking Ischemic Stroke Lesion Outcome Prediction Based on Multispectral MRI , 2018, Front. Neurol..

[9]  Sébastien Ourselin,et al.  Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations , 2017, DLMIA/ML-CDS@MICCAI.

[10]  J De Tobel,et al.  An automated technique to stage lower third molar development on panoramic radiographs for age estimation: a pilot study. , 2017, The Journal of forensic odonto-stomatology.

[11]  Konstantinos Kamnitsas,et al.  Efficient multi‐scale 3D CNN with fully connected CRF for accurate brain lesion segmentation , 2016, Medical Image Anal..

[12]  Matthew B. Blaschko,et al.  Optimizing the Dice Score and Jaccard Index for Medical Image Segmentation: Theory and Practice , 2019, MICCAI.

[13]  A. Demchuk,et al.  Endovascular thrombectomy after large-vessel ischaemic stroke: a meta-analysis of individual patient data from five randomised trials , 2016, The Lancet.

[14]  Jaime S. Cardoso,et al.  Deep Learning and Data Labeling for Medical Applications , 2016, Lecture Notes in Computer Science.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.