Optimization of the Jaccard index for image segmentation with the Lovász hinge

The Jaccard loss, commonly referred to as the intersection-over-union loss, is commonly employed in the evaluation of segmentation quality due to its better perceptual quality and scale invariance, which lends appropriate relevance to small objects compared with per-pixel losses. We present a method for direct optimization of the per-image intersection-over-union loss in neural networks, in the context of semantic image segmentation, based on a convex surrogate: the Lovász hinge. The loss is shown to perform better with respect to the Jaccard index measure than other losses traditionally used in the context of semantic segmentation; such as cross-entropy. We develop a specialized optimization method, based on an efficient computation of the proximal operator of the Lovász hinge, yielding reliably faster and more stable optimization than alternatives. We demonstrate the effectiveness of the method by showing substantially improved intersection-overunion segmentation scores on the Pascal VOC dataset using a state-of-the-art deep learning segmentation architecture.

[1]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[2]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[3]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Matthew B. Blaschko,et al.  Learning Submodular Losses with the Lovasz Hinge , 2015, ICML.

[6]  Matthew B. Blaschko,et al.  The Lovász Hinge: A Convex Surrogate for Submodular Losses , 2015, ArXiv.

[7]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[10]  Sebastian Nowozin,et al.  Optimal Decisions from Probabilistic Models: The Intersection-over-Union Case , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[12]  Daniel Tarlow,et al.  Optimizing Expected Intersection-Over-Union with Candidate-Constrained CRFs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[14]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[15]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[17]  Gabriela Csurka,et al.  What is a good evaluation measure for semantic segmentation? , 2013, BMVC.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[20]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .