Learning Multi-Level Task Groups in Multi-Task Learning

In multi-task learning (MTL), multiple related tasks are learned jointly by sharing information across them. Many MTL algorithms have been proposed to learn the underlying task groups. However, those methods are limited to learn the task groups at only a single level, which may be not sufficient to model the complex structure among tasks in many real-world applications. In this paper, we propose a Multi-Level Task Grouping (MeTaG) method to learn the multi-level grouping structure instead of only one level among tasks. Specifically, by assuming the number of levels to be H, we decompose the parameter matrix into a sum of H component matrices, each of which is regularized with a l2 norm on the pairwise difference among parameters of all the tasks to construct level-specific task groups. For optimization, we employ the smoothing proximal gradient method to efficiently solve the objective function of the MeTaG model. Moreover, we provide theoretical analysis to show that under certain conditions the MeTaG model can recover the true parameter matrix and the true task groups in each level with high probability. We experiment our approach on both synthetic and real-world datasets, showing competitive performance over state-of-the-art MTL methods.

[1]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[2]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[3]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[4]  Jianping Fan,et al.  Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[5]  Hal Daumé,et al.  Bayesian Multitask Learning with Latent Hierarchies , 2009, UAI.

[6]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[7]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[8]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[9]  Dit-Yan Yeung,et al.  A Regularization Approach to Learning Task Relationships in Multitask Learning , 2014, ACM Trans. Knowl. Discov. Data.

[10]  Xi Chen,et al.  Smoothing proximal gradient method for general structured sparse regression , 2010, The Annals of Applied Statistics.

[11]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[12]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[13]  Leon Wenliang Zhong,et al.  Convex Multitask Learning with Flexible Task Clusters , 2012, ICML.

[14]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[15]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[16]  Yu Zhang Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms , 2013, NIPS.

[17]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[18]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[19]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[20]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[21]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[22]  Qian Xu,et al.  Probabilistic Multi-Task Feature Selection , 2010, NIPS.

[23]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[24]  Gunnar Rätsch,et al.  Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation , 2011, NIPS.

[25]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[26]  Lei Han,et al.  Overlapping decomposition for causal graphical modeling , 2012, KDD.

[27]  Aurelie C. Lozano,et al.  Multi-level Lasso for Sparse Multi-task Regression , 2012, ICML.

[28]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[29]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[30]  Daphna Weinshall,et al.  Hierarchical Regularization Cascade for Joint Learning , 2013, ICML.

[31]  Jieping Ye,et al.  Feature grouping and selection over an undirected graph , 2012, KDD.

[32]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[33]  Guojie Song,et al.  Encoding Tree Sparsity in Multi-Task Learning: A Probabilistic Framework , 2014, AAAI.