Optimization for Medical Image Segmentation: Theory and Practice When Evaluating With Dice Score or Jaccard Index

In many medical imaging and classical computer vision tasks, the Dice score and Jaccard index are used to evaluate the segmentation performance. Despite the existence and great empirical success of metric-sensitive losses, i.e. relaxations of these metrics such as soft Dice, soft Jaccard and Lovász-Softmax, many researchers still use per-pixel losses, such as (weighted) cross-entropy to train CNNs for segmentation. Therefore, the target metric is in many cases not directly optimized. We investigate from a theoretical perspective, the relation within the group of metric-sensitive loss functions and question the existence of an optimal weighting scheme for weighted cross-entropy to optimize the Dice score and Jaccard index at test time. We find that the Dice score and Jaccard index approximate each other relatively and absolutely, but we find no such approximation for a weighted Hamming similarity. For the Tversky loss, the approximation gets monotonically worse when deviating from the trivial weight setting where soft Tversky equals soft Dice. We verify these results empirically in an extensive validation on six medical segmentation tasks and can confirm that metric-sensitive losses are superior to cross-entropy based loss functions in case of evaluation with Dice Score or Jaccard Index. This further holds in a multi-class setting, and across different object sizes and foreground/background ratios. These results encourage a wider adoption of metric-sensitive loss functions for medical segmentation tasks where the performance measure of interest is the Dice score or Jaccard index.

[1]  Matthew B. Blaschko,et al.  A Convex Surrogate Operator for General Non-Modular Loss Functions , 2016, AISTATS.

[2]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[3]  Deniz Erdogmus,et al.  Tversky Loss Function for Image Segmentation Using 3D Fully Convolutional Deep Networks , 2017, MLMI@MICCAI.

[4]  Frederik Maes,et al.  INCORPORATION OF TEMPORAL INFORMATION IN A DEEP NEURAL NETWORK IMPROVES PERFORMANCE LEVEL FOR AUTOMATED POLYP DETECTION AND DELINEATION , 2019, ESGE Days 2019.

[5]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yang Wang,et al.  Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation , 2016, ISVC.

[7]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.

[8]  et al.,et al.  Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge , 2018, ArXiv.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Benoit M. Dawant,et al.  Morphometric analysis of white matter lesions in MR images: method and validation , 1994, IEEE Trans. Medical Imaging.

[11]  Konstantinos Kamnitsas,et al.  Efficient multi‐scale 3D CNN with fully connected CRF for accurate brain lesion segmentation , 2016, Medical Image Anal..

[12]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[13]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Alejandro F. Frangi,et al.  Medical Image Computing and Computer Assisted Intervention – MICCAI 2018 , 2018, Lecture Notes in Computer Science.

[15]  J De Tobel,et al.  An automated technique to stage lower third molar development on panoramic radiographs for age estimation: a pilot study. , 2017, The Journal of forensic odonto-stomatology.

[16]  Frederik Maes,et al.  INCORPORATION OF TEMPORAL INFORMATION IN A DEEP NEURAL NETWORK IMPROVES PERFORMANCE LEVEL FOR AUTOMATED POLYP DETECTION AND DELINEATION , 2019, Endoscopy.

[17]  Christos Davatzikos,et al.  Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features , 2017, Scientific Data.

[18]  Ryan P. Adams,et al.  Revisiting uncertainty in graph cut solutions , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Sébastien Ourselin,et al.  Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations , 2017, DLMIA/ML-CDS@MICCAI.

[21]  Simon Andermatt,et al.  Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge , 2019, IEEE Transactions on Medical Imaging.

[22]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[23]  Liang Chen,et al.  Fully automatic acute ischemic lesion segmentation in DWI using convolutional neural networks , 2017, NeuroImage: Clinical.

[24]  Matthew B. Blaschko,et al.  The Lovász Hinge: A Novel Convex Surrogate for Submodular Losses , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Sebastian Nowozin,et al.  Optimal Decisions from Probabilistic Models: The Intersection-over-Union Case , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Matthew B. Blaschko,et al.  Optimizing the Dice Score and Jaccard Index for Medical Image Segmentation: Theory and Practice , 2019, MICCAI.

[27]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[28]  Victor S. Sheng,et al.  Cost-Sensitive Learning , 2009, Encyclopedia of Data Warehousing and Mining.

[29]  Phillip M Cheng,et al.  Artificial Intelligence for Medical Image Analysis: A Guide for Authors and Reviewers. , 2019, AJR. American journal of roentgenology.

[30]  William M. Wells,et al.  Medical Image Computing and Computer-Assisted Intervention — MICCAI’98 , 1998, Lecture Notes in Computer Science.

[31]  Klaus H. Maier-Hein,et al.  No New-Net , 2018, 1809.10483.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[34]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.