论文信息 - Multi-Task Learning as Multi-Objective Optimization

Multi-Task Learning as Multi-Objective Optimization

In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them. Multi-task learning is inherently a multi-objective problem because different tasks may conflict, necessitating a trade-off. A common compromise is to optimize a proxy objective that minimizes a weighted linear combination of per-task losses. However, this workaround is only valid when the tasks do not compete, which is rarely the case. In this paper, we explicitly cast multi-task learning as multi-objective optimization, with the overall objective of finding a Pareto optimal solution. To this end, we use algorithms developed in the gradient-based multi-objective optimization literature. These algorithms are not directly applicable to large-scale learning problems since they scale poorly with the dimensionality of the gradients and the number of tasks. We therefore propose an upper bound for the multi-objective loss and show that it can be optimized efficiently. We further prove that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions. We apply our method to a variety of multi-task deep learning problems including digit classification, scene understanding (joint semantic segmentation, instance segmentation, and depth estimation), and multi-label classification. Our method produces higher-performing models than recent multi-task learning formulations or per-task training.

Vladlen Koltun | Ozan Sener | V. Koltun | Ozan Sener

[1] Philip Wolfe,et al. Finding the nearest point in A polytope , 1976, Math. Program..

[2] Kazuyuki Sekitani,et al. A recursive algorithm for finding the minimum norm point in a polytope and a pair of closest points in two polytopes , 1993, Math. Program..

[3] Naoki Makimoto,et al. An efficient algorithm for finding the minimum norm point in the convex hull of a finite point set in the plane , 1994, Oper. Res. Lett..

[4] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[5] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[7] Kaisa Miettinen,et al. Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[8] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[9] Jörg Fliege,et al. Steepest descent methods for multicriteria optimization , 2000, Math. Methods Oper. Res..

[10] S. Schäffler,et al. Stochastic Method for the Solution of Unconstrained Vector Optimization Problems , 2002 .

[11] Tom Heskes,et al. Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[12] Matthias Ehrgott,et al. Multicriteria Optimization , 2005 .

[13] J. Neyman,et al. INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[14] Christoph F. Eick,et al. Content-based image retrieval through a multi-agent meta-learning framework , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[15] Massimiliano Pontil,et al. Multi-Task Feature Learning , 2006, NIPS.

[16] Lawrence Carin,et al. Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[17] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[18] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19] Dit-Yan Yeung,et al. A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[20] Jiayu Zhou,et al. Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[21] Carlos Soares,et al. Combining a multi-objective optimization approach with meta-learning for SVM parameter selection , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[22] J. Désidéri. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization , 2012 .

[23] Yifan Gong,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24] Steve R. Gunn,et al. Towards Pareto Descent Directions in Sampling Experts for Multiple Tasks in an On-Line Learning Paradigm , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[25] Jasha Droppo,et al. Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[27] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[28] Cong Li,et al. Pareto-Path Multi-Task Multiple Kernel Learning , 2014, ArXiv.

[29] Luca Bascetta,et al. Policy gradient approaches for multi-objective sequential decision making , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[30] Cong Li,et al. Pareto-Path Multitask Multiple Kernel Learning , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[31] Ji Wu,et al. Rapid adaptation for deep neural networks through multi-task learning , 2015, INTERSPEECH.

[32] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[33] Jianmin Wang,et al. Learning Multiple Tasks with Deep Relationship Networks , 2015, ArXiv.

[34] Dianhai Yu,et al. Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[35] Xiaodong Liu,et al. Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[36] Daniel Hern'andez-Lobato,et al. Predictive Entropy Search for Multi-objective Bayesian Optimization with Constraints , 2016, Neurocomputing.

[37] Andrea Vedaldi,et al. Integrated perception with recurrent multi-task neural networks , 2016, NIPS.

[38] Michael Dellnitz,et al. Gradient-Based Multiobjective Optimization with Uncertainties , 2016, 1612.03815.

[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Terrance E. Boult,et al. MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes , 2016, ECCV.

[41] Zoubin Ghahramani,et al. Pareto Frontier Learning with Expensive Correlated Objectives , 2016, ICML.

[42] Marcello Restelli,et al. Inverse Reinforcement Learning through Policy Gradient Minimization , 2016, AAAI.

[43] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Quoc V. Le,et al. Multi-task Sequence to Sequence Learning , 2015, ICLR.

[45] Martial Hebert,et al. Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Jean-Antoine Désidéri,et al. Descent algorithm for nonsmooth stochastic multiobjective optimization , 2017, Comput. Optim. Appl..

[49] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[50] Yongxin Yang,et al. Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.

[51] Iasonas Kokkinos,et al. UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Yoshimasa Tsuruoka,et al. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[53] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[54] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[55] Leonidas J. Guibas,et al. Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56] Zhao Chen,et al. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[57] Bin Jiang,et al. Multi-Task Multi-View Learning Based on Cooperative Multi-Objective Optimization , 2018, IEEE Access.

[58] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59] Matthew Riemer,et al. Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[60] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .