Scalable Pareto Front Approximation for Deep Multi-Objective Learning

Multi-objective optimization is important for various Deep Learning applications, however, no prior multi-objective method suits very deep networks. Existing approaches either require training a new network for every solution on the Pareto front or add a considerable overhead to the number of parameters by introducing hyper-networks conditioned on modifiable preferences. In this paper, we present a novel method that contextualizes the network directly on the preferences by adding them to the input space. In addition, we ensure a well-spread Pareto front by forcing the solutions to preserve a small angle to the preference vector. Through extensive experiments, we demonstrate that our Pareto fronts achieve state-of-the-art quality despite being computed significantly faster. Furthermore, we demonstrate the scalability as our method approximates the full Pareto front on the CelebA dataset with an EfficientNet network at a marginal training time overhead of 7% compared to a single-objective optimization. We make the code publicly available at https://github.com/ruchtem/cosmos.

[1]  Frank Hutter,et al.  Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution , 2018, ICLR.

[2]  Gade Pandu Rangaiah,et al.  Application and Analysis of Methods for Selecting an Optimal Solution from the Pareto-Optimal Front obtained by Multiobjective Optimization , 2017 .

[3]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[5]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[6]  Qingfu Zhang,et al.  Controllable Pareto Multi-Task Learning , 2020, ArXiv.

[7]  Runzhe Yang,et al.  A Generalized Algorithm for Multi-Objective RL and Policy Adaptation , 2019 .

[8]  Marcello Restelli,et al.  Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation , 2016, J. Artif. Intell. Res..

[9]  Alexey Dosovitskiy,et al.  You Only Train Once: Loss-Conditional Training of Deep Networks , 2020, ICLR.

[10]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[12]  E. Polak,et al.  On Multicriteria Optimization , 1976 .

[13]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[14]  Vaibhav Rajan,et al.  Multi-Task Learning with User Preferences: Gradient Descent with Controlled Ascent in Pareto Optimization , 2020, ICML.

[15]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Lothar Thiele,et al.  The Hypervolume Indicator Revisited: On the Design of Pareto-compliant Indicators Via Weighted Integration , 2007, EMO.

[17]  Qingfu Zhang,et al.  Pareto Multi-Task Learning , 2019, NeurIPS.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Jitendra Malik,et al.  Which Tasks Should Be Learned Together in Multi-task Learning? , 2019, ICML.

[20]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[21]  Jean-Antoine Désidéri,et al.  MUTIPLE-GRADIENT DESCENT ALGORITHM FOR MULTIOBJECTIVE OPTIMIZATION , 2012 .

[22]  Boi Faltings,et al.  Addressing Fairness in Classification with a Model-Agnostic Multi-Objective Algorithm , 2020, UAI.

[23]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[24]  Aditya Krishna Menon,et al.  The cost of fairness in binary classification , 2018, FAT.

[25]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[26]  Gal Chechik,et al.  Learning the Pareto Front with Hypernetworks , 2020, ICLR.

[27]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[28]  Jörg Fliege,et al.  Steepest descent methods for multicriteria optimization , 2000, Math. Methods Oper. Res..