The Lovász Hinge: A Convex Surrogate for Submodular Losses

Learning with non-modular losses is an important problem when sets of predictions are made simultaneously. The main tools for constructing convex surrogate loss functions for set prediction are margin rescaling and slack rescaling. In this work, we show that these strategies lead to tight convex surrogates iff the underlying loss function is increasing in the number of incorrect predictions. However, gradient or cutting-plane computation for these functions is NP-hard for non-supermodular loss functions. We propose instead a novel surrogate loss function for submodular losses, the Lovász hinge, which leads to O(p log p) complexity with O(p) oracle accesses to the loss function to compute a gradient or cutting-plane. We prove that the Lovász hinge is convex and yields an extension. As a result, we have developed the first tractable convex surrogates in the literature for submodular losses. We demonstrate the utility of this novel convex surrogate through several set prediction tasks, including on the PASCAL VOC and Microsoft COCO datasets.

[1]  Chun-Liang Li,et al.  Condensed Filter Tree for Cost-Sensitive Multi-Label Classification , 2014, ICML.

[2]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[3]  Mandy Eberhart,et al.  Decision Forests For Computer Vision And Medical Image Analysis , 2016 .

[4]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[5]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[6]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[7]  Hiroki Arimura,et al.  An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases , 2004, Discovery Science.

[8]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[9]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[10]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[11]  Martial Hebert,et al.  Toward Objective Evaluation of Image Segmentation Algorithms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[13]  Jun Yu,et al.  HC-Search for Multi-Label Prediction: An Empirical Study , 2014, AAAI.

[14]  Jack Edmonds,et al.  Matroids and the greedy algorithm , 1971, Math. Program..

[15]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[16]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[17]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[19]  Gökhan BakIr,et al.  Generalization Bounds and Consistency for Structured Labeling , 2007 .

[20]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[21]  Matthew B. Blaschko,et al.  Learning Submodular Losses with the Lovasz Hinge , 2015, ICML.

[22]  Rishabh K. Iyer,et al.  The Lovasz-Bregman Divergence and connections to rank aggregation, clustering, and web ranking , 2013, UAI.

[23]  Sebastian Nowozin,et al.  Optimal Decisions from Probabilistic Models: The Intersection-over-Union Case , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[25]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[26]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[29]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[30]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[31]  Tibério S. Caetano,et al.  Submodular Multi-Label Learning , 2011, NIPS.