A Unified Framework of Surrogate Loss by Refactoring and Interpolation

We introduce UniLoss, a unified framework to generate surrogate losses for training deep networks with gradient descent, reducing the amount of manual design of task-specific surrogate losses. Our key observation is that in many cases, evaluating a model with a performance metric on a batch of examples can be refactored into four steps: from input to real-valued scores, from scores to comparisons of pairs of scores, from comparisons to binary variables, and from binary variables to the final performance metric. Using this refactoring we generate differentiable approximations for each non-differentiable step through interpolation. Using UniLoss, we can optimize for different tasks and metrics using one unified framework, achieving comparable performance compared with task-specific losses. We validate the effectiveness of UniLoss on three tasks and four datasets. Code is available at this https URL.

[1]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ambuj Tewari,et al.  Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses , 2013, NIPS.

[4]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.

[5]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[6]  Siqi Liu,et al.  Improved Image Captioning via Policy Gradient optimization of SPIDEr , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[8]  P. Pérez,et al.  SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[10]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[11]  Yang Song,et al.  Training Deep Neural Networks via Direct Loss Minimization , 2015, ICML.

[12]  Richard Socher,et al.  Improving End-to-End Speech Recognition with Policy Learning , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Shivani Agarwal,et al.  On the Consistency of Output Code Based Learning Algorithms for Multiclass Learning Problems , 2014, COLT.

[14]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Yu Tsao,et al.  End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[17]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[18]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[20]  Li Fei-Fei,et al.  End-to-End Learning of Action Detection from Frame Glimpses in Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bowen Zhou,et al.  Learning Loss Functions for Semi-supervised Learning via Discriminative Adversarial Networks , 2017, ArXiv.

[22]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[23]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[24]  Vittorio Ferrari,et al.  End-to-End Training of Object Class Detectors for Mean Average Precision , 2016, ACCV.

[25]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[26]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[27]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[28]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[29]  Carlos Guestrin,et al.  Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment , 2019, ICML.

[30]  Lars Schmidt-Thieme,et al.  Learning Surrogate Losses , 2019, ArXiv.

[31]  Lijun Wu,et al.  Learning to Teach with Dynamic Loss Functions , 2018, NeurIPS.

[32]  Scott Sanner,et al.  Algorithms for Direct 0-1 Loss Optimization in Binary Classification , 2013, ICML.