A Convex Surrogate Operator for General Non-Modular Loss Functions

Empirical risk minimization frequently employs convex surrogates to underlying discrete loss functions in order to achieve computational tractability during optimization. However, classical convex surrogates can only tightly bound modular loss functions, sub-modular functions or supermodular functions separately while maintaining polynomial time computation. In this work, a novel generic convex surrogate for general non-modular loss functions is introduced, which provides for the first time a tractable solution for loss functions that are neither super-modular nor submodular. This convex surro-gate is based on a submodular-supermodular decomposition for which the existence and uniqueness is proven in this paper. It takes the sum of two convex surrogates that separately bound the supermodular component and the submodular component using slack-rescaling and the Lov{a}sz hinge, respectively. It is further proven that this surrogate is convex , piecewise linear, an extension of the loss function, and for which subgradient computation is polynomial time. Empirical results are reported on a non-submodular loss based on the S{o}rensen-Dice difference function, and a real-world face track dataset with tens of thousands of frames, demonstrating the improved performance, efficiency, and scalabil-ity of the novel convex surrogate.

[1]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[2]  Tibério S. Caetano,et al.  Submodular Multi-Label Learning , 2011, NIPS.

[3]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[4]  Julee Cobb,et al.  Hello, My Name Is… , 2016 .

[5]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[6]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[7]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[8]  Rishabh K. Iyer,et al.  Submodular Hamming Metrics , 2015, NIPS.

[9]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[10]  Matthew B. Blaschko,et al.  Learning Submodular Losses with the Lovasz Hinge , 2015, ICML.

[11]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  O. Mangasarian Uniqueness of solution in linear programming , 1979 .

[14]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[15]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[16]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[17]  Rishabh K. Iyer,et al.  Algorithms for Approximate Minimization of the Difference Between Submodular Functions, with Applications , 2012, UAI.

[18]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[19]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[20]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[21]  Andrew Zisserman,et al.  "Who are you?" - Learning person specific classifiers from video , 2009, CVPR.

[22]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[23]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[24]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[25]  Jeff A. Bilmes,et al.  A Submodular-supermodular Procedure with Applications to Discriminative Structure Learning , 2005, UAI.

[26]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[27]  Jun Yu,et al.  HC-Search for Multi-Label Prediction: An Empirical Study , 2014, AAAI.

[28]  Andrew Zisserman,et al.  Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..

[29]  Pavel Rychlý,et al.  A Lexicographer-Friendly Association Score , 2008, RASLAN.

[30]  Mert R. Sabuncu,et al.  A Generative Model for Image Segmentation Based on Label Fusion , 2010, IEEE Transactions on Medical Imaging.

[31]  Sebastian Nowozin,et al.  Optimal Decisions from Probabilistic Models: The Intersection-over-Union Case , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.