Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels

Intersection over Union (IoU) losses are surrogates that directly optimize the Jaccard index. Leveraging IoU losses as part of the loss function have demonstrated superior performance in semantic segmentation tasks compared to optimizing pixel-wise losses such as the cross-entropy loss alone. However, we identify a lack of flexibility in these losses to support vital training techniques like label smoothing, knowledge distillation, and semi-supervised learning, mainly due to their inability to process soft labels. To address this, we introduce Jaccard Metric Losses (JMLs), which are identical to the soft Jaccard loss in standard settings with hard labels but are fully compatible with soft labels. We apply JMLs to three prominent use cases of soft labels: label smoothing, knowledge distillation and semi-supervised learning, and demonstrate their potential to enhance model accuracy and calibration. Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, DeepGlobe Land) and 13 architectures, including classic CNNs and recent vision transformers. Remarkably, our straightforward approach significantly outperforms state-of-the-art knowledge distillation and semi-supervised learning methods. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.

[1]  Philip H. S. Torr,et al.  Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union , 2023, ArXiv.

[2]  Ross B. Girshick,et al.  Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Matthew B. Blaschko,et al.  Dice Semimetric Losses: Optimizing the Dice Score with Soft Labels , 2023, MICCAI.

[4]  Xinlei Chen,et al.  ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jingdong Wang,et al.  Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jingdong Wang,et al.  Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hongsheng Li,et al.  InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Matthew B. Blaschko,et al.  A Consistent and Differentiable Lp Canonical Calibration Error Estimator , 2022, NeurIPS.

[9]  Wayne Zhang,et al.  Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Zhulin An,et al.  MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition , 2022, ECCV.

[11]  Anne L. Martel,et al.  Metrics reloaded: Recommendations for image analysis validation , 2022, 2206.01653.

[12]  Maxwell D. Collins,et al.  CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Fei Wang,et al.  Masked Distillation with Receptive Tokens , 2022, ICLR.

[14]  Fei Wang,et al.  Knowledge Distillation from A Stronger Teacher , 2022, NeurIPS.

[15]  Zhulin An,et al.  Cross-Image Relational Knowledge Distillation for Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jiajun Liang,et al.  Decoupled Knowledge Distillation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xinyi Le,et al.  Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Matthew B. Blaschko,et al.  On the Relationship Between Calibrated Predictors and Unbiased Volume Estimation , 2021, MICCAI.

[20]  A. Schwing,et al.  Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  G. Carneiro,et al.  Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation , 2021, Computer Vision and Pattern Recognition.

[22]  Alexander G. Schwing,et al.  Per-Pixel Classification is Not All You Need for Semantic Segmentation , 2021, NeurIPS.

[23]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[24]  Matthew B. Blaschko,et al.  Meta-Cal: Well-controlled Post-hoc Calibration by Ranking , 2021, ICML.

[25]  Xizhou Zhu,et al.  AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Junsong Yuan,et al.  Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective , 2021, ICLR.

[27]  Masayoshi Tomizuka,et al.  Labels are Not Perfect: Inferring Spatial Uncertainty in Object Detection , 2020, IEEE Transactions on Intelligent Transportation Systems.

[28]  Jens Petersen,et al.  nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation , 2020, Nature Methods.

[29]  A. Yuille,et al.  MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Chunhua Shen,et al.  Channel-wise Knowledge Distillation for Dense Prediction* , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Matthew B. Blaschko,et al.  Post Training Uncertainty Calibration Of Deep Networks For Medical Image Segmentation , 2020, 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).

[32]  Xiaogang Wang,et al.  Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation , 2020, ICLR.

[33]  Paul Suetens,et al.  Theoretical analysis and experimental validation of volume bias of soft Dice optimized segmentation maps in the context of inherent uncertainty , 2020, Medical Image Anal..

[34]  Sébastien Ourselin,et al.  Learning joint segmentation of tissues and brain lesions from task-specific hetero-modal domain-shifted datasets , 2020, Medical Image Anal..

[35]  Marc Niethammer,et al.  Local Temperature Scaling for Probability Calibration , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Matthew B. Blaschko,et al.  Optimization for Medical Image Segmentation: Theory and Practice When Evaluating With Dice Score or Jaccard Index , 2020, IEEE Transactions on Medical Imaging.

[37]  Tao Wang,et al.  Revisiting Knowledge Distillation via Label Smoothing Regularization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[39]  Matthew B. Blaschko,et al.  AOWS: Adaptive and Optimal Network Width Search With Latency Constraints , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Hossein Mobahi,et al.  Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.

[41]  Ed H. Chi,et al.  Understanding and Improving Knowledge Distillation , 2020, ArXiv.

[42]  Purang Abolmaesumi,et al.  Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation , 2019, IEEE Transactions on Medical Imaging.

[43]  Liang Lin,et al.  Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Fei Wu,et al.  Dice Loss for Data-imbalanced NLP Tasks , 2019, ACL.

[46]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[47]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[49]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[50]  Thomas S. Huang,et al.  Universally Slimmable Networks and Improved Training Techniques , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Ning Xu,et al.  Slimmable Neural Networks , 2018, ICLR.

[53]  Naimul Mefraz Khan,et al.  A Novel Focal Tversky Loss Function With Improved Attention U-Net for Lesion Segmentation , 2018, 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).

[54]  Ryan Moulton,et al.  Maximally Consistent Sampling and the Jaccard Index of Probability Distributions , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[55]  Sunita Sarawagi,et al.  Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings , 2018, ICML.

[56]  Sergey I. Nikolenko,et al.  Land Cover Classification from Satellite Imagery with U-Net and Lovász-Softmax Loss , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[57]  Jing Huang,et al.  DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[58]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[59]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[62]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[63]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[64]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Sébastien Ourselin,et al.  Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations , 2017, DLMIA/ML-CDS@MICCAI.

[66]  Deniz Erdogmus,et al.  Tversky Loss Function for Image Segmentation Using 3D Fully Convolutional Deep Networks , 2017, MLMI@MICCAI.

[67]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[68]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[69]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Yang Wang,et al.  Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation , 2016, ISVC.

[71]  Sven Kosub,et al.  A note on the triangle inequality for the Jaccard distance , 2016, Pattern Recognit. Lett..

[72]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[74]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Matthew B. Blaschko,et al.  The Lovász Hinge: A Novel Convex Surrogate for Submodular Losses , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[79]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[80]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[81]  Sebastian Nowozin,et al.  Optimal Decisions from Probabilistic Models: The Intersection-over-Union Case , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Sergey Ioffe,et al.  Improved Consistent Sampling, Weighted Minhash and L1 Sketching , 2010, 2010 IEEE International Conference on Data Mining.

[83]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  H. Späth,et al.  The minisum location problem for the Jaccard metric , 1981 .

[85]  Gang Niu,et al.  PiCO: Contrastive Label Disambiguation for Partial Label Learning , 2022, ICLR.

[86]  Maxwell D. Collins,et al.  k-means Mask Transformer , 2022, ECCV.

[87]  Jae-Joon Han,et al.  Learning Generalized Intersection Over Union for Dense Pixelwise Prediction , 2021, ICML.

[88]  Xiang Bai,et al.  Intra-class Feature Variation Distillation for Semantic Segmentation , 2020, ECCV.

[89]  Segmentation Models , 2016, Brand Management Strategies.

[90]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[91]  Christopher K. I. Williams,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) The PASCAL Visual Object Classes (VOC) Challenge , 2022 .