MetricOpt: Learning to Optimize Black-Box Evaluation Metrics

We study the problem of directly optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall. Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown. We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations. The learned value function is easily pluggable into existing optimizers like SGD and Adam, and is effective for rapidly finetuning a pre-trained model. This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision. MetricOpt achieves state-of-the-art performance on a variety of metrics for (image) classification, image retrieval and object detection. Solid benefits are found over competing methods, which often involve complex loss design or adaptation. MetricOpt also generalizes well to new tasks and model architectures.

[1]  P. Pérez,et al.  SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[4]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[5]  Lijun Wu,et al.  Learning to Teach with Dynamic Loss Functions , 2018, NeurIPS.

[6]  Stan Sclaroff,et al.  Deep Metric Learning to Rank , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kun He,et al.  Hashing as Tie-Aware Learning to Rank , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Jascha Sohl-Dickstein,et al.  Guided evolutionary strategies: augmenting random search with surrogate gradients , 2018, ICML.

[9]  Hugo Larochelle,et al.  Modulating early visual processing by language , 2017, NIPS.

[10]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[11]  Harikrishna Narasimhan,et al.  On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures , 2014, NIPS.

[12]  Jia Deng,et al.  A Unified Framework of Surrogate Loss by Refactoring and Interpolation , 2020, ECCV.

[13]  Carlos Guestrin,et al.  Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment , 2019, ICML.

[14]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[17]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yan Lu,et al.  Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Qijia Jiang,et al.  Optimizing Black-box Metrics with Adaptive Surrogates , 2020, ICML.

[23]  Eric P. Xing,et al.  AutoLoss: Learning Discrete Schedules for Alternate Optimization , 2018, ICLR 2018.

[24]  Quoc V. Le,et al.  Neural Optimizer Search with Reinforcement Learning , 2017, ICML.

[25]  Oluwasanmi Koyejo,et al.  Consistent Binary Classification with Generalized Performance Metrics , 2014, NIPS.

[26]  Jiri Matas,et al.  Learning Surrogates via Deep Embedding , 2020, ECCV.

[27]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[29]  Claudio Michaelis,et al.  Optimizing Rank-Based Metrics With Blackbox Differentiation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[31]  Razvan Pascanu,et al.  Meta-Learning with Warped Gradient Descent , 2020, ICLR.

[32]  Yves Grandvalet,et al.  Optimizing F-Measures by Cost-Sensitive Classification , 2014, NIPS.

[33]  C. V. Jawahar,et al.  Efficient Optimization for Rank-Based Loss Functions , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[35]  Vittorio Ferrari,et al.  End-to-End Training of Object Class Detectors for Mean Average Precision , 2016, ACCV.

[36]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[37]  Jeremy Nixon,et al.  Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.

[38]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[39]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[40]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[42]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Ling-Yu Duan,et al.  Towards Accurate One-Stage Object Detection With AP-Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yang Song,et al.  Training Deep Neural Networks via Direct Loss Minimization , 2015, ICML.

[45]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[47]  Mahdi Milani Fard,et al.  Metric-Optimized Example Weights , 2018, ICML.

[48]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[49]  Fred A. Hamprecht,et al.  Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.