Learning Surrogate Losses

The minimization of loss functions is the heart and soul of Machine Learning. In this paper, we propose an off-the-shelf optimization approach that can minimize virtually any non-differentiable and non-decomposable loss function (e.g. Miss-classification Rate, AUC, F1, Jaccard Index, Mathew Correlation Coefficient, etc.) seamlessly. Our strategy learns smooth relaxation versions of the true losses by approximating them through a surrogate neural network. The proposed loss networks are set-wise models which are invariant to the order of mini-batch instances. Ultimately, the surrogate losses are learned jointly with the prediction model via bilevel optimization. Empirical results on multiple datasets with diverse real-life loss functions compared with state-of-the-art baselines demonstrate the efficiency of learning surrogate losses.

[1]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[2]  J. Revalski,et al.  Well-posed constrained optimization problems in metric spaces , 1993 .

[3]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[4]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[5]  Tie-Yan Liu,et al.  Ranking Measures and Loss Functions in Learning to Rank , 2009, NIPS.

[6]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[7]  Nan Ye,et al.  Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[8]  Yves Grandvalet,et al.  Optimizing F-Measures by Cost-Sensitive Classification , 2014, NIPS.

[9]  Oluwasanmi Koyejo,et al.  Consistent Binary Classification with Generalized Performance Metrics , 2014, NIPS.

[10]  Charles Elkan,et al.  Optimal Thresholding of Classifiers to Maximize F1 Measure , 2014, ECML/PKDD.

[11]  Matthew B. Blaschko,et al.  Learning Submodular Losses with the Lovasz Hinge , 2015, ICML.

[12]  Zhi-Hua Zhou,et al.  On the Consistency of AUC Pairwise Optimization , 2012, IJCAI.

[13]  Yang Song,et al.  Training Deep Neural Networks via Direct Loss Minimization , 2015, ICML.

[14]  Elad Eban,et al.  Scalable Learning of Non-Decomposable Objectives , 2016, AISTATS.

[15]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[16]  Jitendra Malik,et al.  Learning to Optimize , 2016, ICLR.

[17]  Mingrui Liu,et al.  Faster Online Learning of Optimal Threshold for Consistent F-measure Optimization , 2018, NeurIPS.

[18]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[19]  Tao Qin,et al.  Learning to Teach , 2018, ICLR.

[20]  Lijun Wu,et al.  Learning to Teach with Dynamic Loss Functions , 2018, NeurIPS.

[21]  R. S-A. Gatsaeva,et al.  On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition , 2018 .

[22]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Rémi Emonet,et al.  From Cost-Sensitive to Tight F-measure Bounds , 2019, AISTATS.