TOMATO: A Topic-Wise Multi-Task Sparsity Model

The Multi-Task Learning (MTL) leverages the inter-relationship across tasks and is useful for applications with limited data. Existing works articulate different task relationship assumptions, whose validity is vital to successful multi-task training. We observe that, in many scenarios, the inter-relationship across tasks varies across different groups of data (i.e., topic), which we call within-topic task relationship hypothesis. In this case, current MTL models with homogeneous task relationship assumption cannot fully exploit different task relationships among different groups of data. Based on this observation, in this paper, we propose a generalized topic-wise multi-task architecture, to capture the within-topic task relationship, which can be combined with any existing MTL designs. Further, we propose a new specialized MTL design, topic-task-sparsity, along with two different types of sparsity constraints. The architecture, combined with the topic-task-sparsity design, constructs our proposed TOMATO model. The experiments on both synthetic and 4 real-world datasets show that our proposed models consistently outperform 6 state-of-the-art models and 2 baselines with improvement from $5%$ to $46%$ in terms of task-wise comparison, demonstrating the validity of the proposed within-topic task relationship hypothesis. We release the source codes and datasets of TOMATO at: https://github.com/JasonLC506/MTSEM.

[1]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[2]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[3]  Eunho Yang,et al.  Asymmetric multi-task learning based on task relatedness and loss , 2016, ICML 2016.

[4]  Matthew Riemer,et al.  Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[5]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[6]  MaurerAndreas,et al.  The benefit of multitask representation learning , 2016 .

[7]  Sinno Jialin Pan,et al.  Adaptive Group Sparse Multi-task Learning via Trace Lasso , 2017, IJCAI.

[8]  Pushmeet Kohli,et al.  Memory Bounded Deep Convolutional Networks , 2014, ArXiv.

[9]  Narendra Ahuja,et al.  Robust visual tracking via multi-task sparse learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sung Ju Hwang,et al.  Combined Group and Exclusive Sparsity for Deep Neural Networks , 2017, ICML.

[12]  Massimiliano Pontil,et al.  The Benefit of Multitask Representation Learning , 2015, J. Mach. Learn. Res..

[13]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[14]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[15]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[16]  Fangzhao Wu,et al.  Collaborative Multi-domain Sentiment Classification , 2015, 2015 IEEE International Conference on Data Mining.

[17]  Dit-Yan Yeung,et al.  Multi-task warped Gaussian process for personalized age estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[19]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[20]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[21]  Larry S. Davis,et al.  Multi-Task Learning with Low Rank Attribute Embedding for Person Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Yongxin Yang,et al.  Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[24]  Daniel Hernández-Lobato,et al.  A Probabilistic Model for Dirty Multi-task Feature Selection , 2015, ICML.

[25]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[26]  Philip S. Yu,et al.  Learning Multiple Tasks with Multilinear Relationship Networks , 2015, NIPS.

[27]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[28]  Hongning Wang,et al.  Clustered Model Adaption for Personalized Sentiment Analysis , 2017, WWW.

[29]  Jinbo Bi,et al.  On Multiplicative Multitask Feature Learning , 2014, NIPS.

[30]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[34]  Qiang Zhou,et al.  Flexible Clustered Multi-Task Learning by Learning Representative Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[36]  Mathieu Salzmann,et al.  Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[37]  Zhe Zhao,et al.  Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts , 2018, KDD.

[38]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[39]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[41]  Xiaogang Wang,et al.  Boosted multi-task learning for face verification with applications to web image and video search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Simon King,et al.  Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).