Multiple task learning with flexible structure regularization

Due to the theoretical advances and empirical successes, Multi-task Learning (MTL) has become a popular design paradigm for training a set of tasks jointly. Through exploring the hidden relationships among multiple tasks, many MTL algorithms have been developed to enhance learning performance. In general, the complicated hidden relationships can be considered as a combination of two key structural elements: task grouping and task outlier. Based on such task relationship, here we propose a generic MTL framework with flexible structure regularization, which aims in relaxing any type of specific structure assumptions. In particular, we directly impose a joint ? 11 / ? 21 -norm as the regularization term to reveal the underlying task relationship in a flexible way. Such a flexible structure regularization term takes into account any convex combination of grouping and outlier structural characteristics among the multiple tasks. In order to derive efficient solutions for the generic MTL framework, we develop two algorithms, i.e., the Iteratively Reweighted Least Square (IRLS) method and the Accelerated Proximal Gradient (APG) method, with different emphasis and strength. In addition, the theoretical convergence and performance guarantee are analyzed for both algorithms. Finally, extensive experiments over both synthetic and real data, and the comparisons with several state-of-the-art algorithms demonstrate the superior performance of the proposed generic MTL method.

[1]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[2]  Dit-Yan Yeung,et al.  Transfer metric learning by learning task relationships , 2010, KDD.

[3]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[4]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[5]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[6]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[7]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[8]  Daniele Calandriello,et al.  Sparse multi-task reinforcement learning , 2014, Intelligenza Artificiale.

[9]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[10]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[13]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[14]  Jonathan Eckstein Splitting methods for monotone operators with applications to parallel optimization , 1989 .

[15]  Qian Xu,et al.  Probabilistic Multi-Task Feature Selection , 2010, NIPS.

[16]  Xiaogang Wang,et al.  Boosted multi-task learning for face verification with applications to web image and video search , 2009, CVPR.

[17]  Bhaskar D. Rao,et al.  Subset selection in noise based on diversity measure minimization , 2003, IEEE Trans. Signal Process..

[18]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[19]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[20]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[21]  Alexander J. Smola,et al.  Collaborative Email-Spam Filtering with the Hashing-Trick , 2009 .

[22]  Yoshua Bengio,et al.  Multi-Task Learning for Stock Selection , 1996, NIPS.

[23]  Marc Teboulle,et al.  Gradient-based algorithms with applications to signal-recovery problems , 2010, Convex Optimization in Signal Processing and Communications.

[24]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[25]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[26]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[27]  Anton Schwaighofer,et al.  Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[28]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[29]  Robert D. Nowak,et al.  Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis , 2013, NIPS.

[30]  Thomas G. Dietterich,et al.  To transfer or not to transfer , 2005, NIPS 2005.

[31]  Hongliang Fei,et al.  Structured Feature Selection and Task Relationship Inference for Multi-task Learning , 2011, ICDM.

[32]  G. Chen Forward-backward splitting techniques: theory and applications , 1994 .

[33]  Jiayu Zhou,et al.  Modeling disease progression via multi-task learning , 2013, NeuroImage.

[34]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[35]  Seunghak Lee,et al.  Adaptive Multi-Task Lasso: with Application to eQTL Detection , 2010, NIPS.

[36]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[37]  Dit-Yan Yeung,et al.  Learning High-Order Task Relationships in Multi-Task Learning , 2013, IJCAI.

[38]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[39]  Jun Wang,et al.  Which Looks Like Which: Exploring Inter-class Relationships in Fine-Grained Visual Categorization , 2014, ECCV.

[40]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[41]  Shannon L. Risacher,et al.  Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance , 2011, 2011 International Conference on Computer Vision.

[42]  Bhaskar D. Rao,et al.  An affine scaling methodology for best basis selection , 1999, IEEE Trans. Signal Process..

[43]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[44]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[45]  Jing Liu,et al.  Robust Structured Subspace Learning for Data Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[47]  Oliver Stegle,et al.  It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals , 2013, NIPS.

[48]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[49]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[50]  Xiangyang Xue,et al.  Multiple Task Learning Using Iteratively Reweighted Least Square , 2013, IJCAI.

[51]  Eric P. Xing,et al.  Heterogeneous multitask learning with joint sparsity constraints , 2009, NIPS.

[52]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[53]  Thomas Lengauer,et al.  Multi-task learning for HIV therapy screening , 2008, ICML '08.

[54]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[55]  M. Wainwright,et al.  Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ 1,∞ -regularization , 2008, NIPS 2008.

[56]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[57]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[58]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[59]  Xi Chen,et al.  Smoothing Proximal Gradient Method for General Structured Sparse Learning , 2011, UAI.

[60]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[61]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[62]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[63]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[64]  Paul Tseng,et al.  Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning , 2010, SIAM J. Optim..