Efficiently Identifying Task Groupings for Multi-Task Learning

Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naively training all tasks together in one model often degrades performance, and exhaustively searching through combinations of task groupings can be prohibitively expensive. As a result, efficiently identifying the tasks that would benefit from training together remains a challenging design question without a clear solution. In this paper, we suggest an approach to select which tasks should train together in multi-task learning models. Our method determines task groupings in a single run by training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss. On the large-scale Taskonomy computer vision dataset, we find this method can decrease test loss by 10.0% compared to simply training all tasks together while operating 11.6 times faster than a state-of-the-art task grouping method.

[1]  S. Levine,et al.  MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.

[2]  Dragomir Anguelov,et al.  Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout , 2020, NeurIPS.

[3]  Yulia Tsvetkov,et al.  Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models , 2020, ICLR.

[4]  Daniel Ulbricht,et al.  Learning to Branch for Multi-Task Learning , 2020, ICML.

[5]  Sen Wu,et al.  Understanding and Improving Information Transfer in Multi-Task Learning , 2020, ICLR.

[6]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[7]  Jean-Philippe Vert,et al.  Large-Scale Machine Learning , 2020, Mining of Massive Datasets.

[8]  Qingfu Zhang,et al.  Pareto Multi-Task Learning , 2019, NeurIPS.

[9]  R. Feris,et al.  AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , 2019, NeurIPS.

[10]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2019, Proceedings of the IEEE.

[11]  Lars Schmidt-Thieme,et al.  Chameleon: Learning Model Initializations Across Tasks With Different Schemas , 2019, ArXiv.

[12]  Yike Guo,et al.  Regularizing Deep Multi-Task Networks using Orthogonal Gradients , 2019, ArXiv.

[13]  Geoffrey E. Hinton,et al.  Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.

[14]  Jitendra Malik,et al.  Which Tasks Should Be Learned Together in Multi-task Learning? , 2019, ICML.

[15]  Kshitij Dwivedi,et al.  Representation Similarity Analysis for Efficient Task Taxonomy & Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Stefano Soatto,et al.  The Information Complexity of Learning Tasks, their Structure and their Distance , 2019, Information and Inference: A Journal of the IMA.

[17]  Luc Van Gool,et al.  Branched Multi-Task Networks: Deciding what layers to share , 2019, BMVC.

[18]  Subhransu Maji,et al.  Task2Vec: Task Embedding for Meta-Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[20]  Li Fei-Fei,et al.  Dynamic Task Prioritization for Multitask Learning , 2018, ECCV.

[21]  Ying Wu,et al.  A Modulation Module for Multi-task Learning with Applications in Image Retrieval , 2018, ECCV.

[22]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[23]  François Chollet,et al.  Keras: The Python Deep Learning library , 2018 .

[24]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Xi Li,et al.  GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning , 2018, ACM Multimedia.

[26]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[28]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[29]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[30]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[31]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[32]  Eunho Yang,et al.  Deep Asymmetric Multi-task Feature Learning , 2017, ICML.

[33]  Qiang Yang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[34]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[35]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[37]  Yu Cheng,et al.  Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[39]  Eunho Yang,et al.  Asymmetric multi-task learning based on task relatedness and loss , 2016, ICML 2016.

[40]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[41]  Yongxin Yang,et al.  Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.

[42]  A. Gupta,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[45]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[47]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[48]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[49]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[50]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[51]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[52]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[53]  Qingmin Liao,et al.  Towards Impartial Multi-task Learning , 2021, ICLR.

[54]  Ayan Das,et al.  System for , 2021 .

[55]  David Held,et al.  Adaptive Auxiliary Task Weighting for Reinforcement Learning , 2019, NeurIPS.

[56]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[57]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[58]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[59]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[60]  Andy Davis,et al.  This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Tensorflow: a System for Large-scale Machine Learning Tensorflow: a System for Large-scale Machine Learning , 2022 .