A Regularization Approach to Learning Task Relationships in Multitask Learning

Multitask learning is a learning paradigm that seeks to improve the generalization performance of a learning task with the help of some other related tasks. In this article, we propose a regularization approach to learning the relationships between tasks in multitask learning. This approach can be viewed as a novel generalization of the regularized formulation for single-task learning. Besides modeling positive task correlation, our approach—multitask relationship learning (MTRL)—can also describe negative task correlation and identify outlier tasks based on the same underlying principle. By utilizing a matrix-variate normal distribution as a prior on the model parameters of all tasks, our MTRL method has a jointly convex objective function. For efficiency, we use an alternating method to learn the optimal model parameters for each task as well as the relationships between tasks. We study MTRL in the symmetric multitask learning setting and then generalize it to the asymmetric setting as well. We also discuss some variants of the regularization approach to demonstrate the use of other matrix-variate priors for learning task relationships. Moreover, to gain more insight into our model, we also study the relationships between MTRL and some existing multitask learning methods. Experiments conducted on a toy problem as well as several benchmark datasets demonstrate the effectiveness of MTRL as well as its high interpretability revealed by the task covariance matrix.

[1]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[2]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[3]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[4]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[5]  Arjun K. Gupta,et al.  MATRIX-VARIATE BETA DISTRIBUTION , 2000 .

[6]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[7]  S. Sathiya Keerthi,et al.  SMO Algorithm for Least-Squares SVM Formulation , 2003, Neural Computation.

[8]  S. Keerthi,et al.  SMO Algorithm for Least-Squares SVM Formulations , 2003, Neural Computation.

[9]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[10]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[11]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[12]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[13]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[14]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[15]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[16]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[17]  Rajat Raina,et al.  Constructing informative priors using transfer learning , 2006, ICML.

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[20]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[21]  Kumar Chellapilla,et al.  Personalized handwriting recognition via biased regularization , 2006, ICML.

[22]  NiyogiPartha,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006 .

[23]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[24]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[25]  Volker Tresp,et al.  Robust multi-task learning with t-processes , 2007, ICML '07.

[26]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[27]  Masashi Sugiyama,et al.  Multi-Task Learning via Conic Programming , 2007, NIPS.

[28]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[29]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[30]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[31]  Massimiliano Pontil,et al.  An Algorithm for Transfer Learning in a Heterogeneous Environment , 2008, ECML/PKDD.

[32]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[33]  Eric Eaton,et al.  Modeling Transfer Relationships Between Learning Tasks for Improved Inductive Transfer , 2008, ECML/PKDD.

[34]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[35]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[36]  Qian Xu,et al.  Probabilistic Multi-Task Feature Selection , 2010, NIPS.

[37]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[38]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[39]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[40]  Dit-Yan Yeung,et al.  Transfer metric learning by learning task relationships , 2010, KDD.

[41]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[42]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[43]  Dit-Yan Yeung,et al.  Multi-Task Learning using Generalized t Process , 2010, AISTATS.

[44]  Hal Daumé,et al.  Infinite Predictor Subspace Models for Multitask Learning , 2010, AISTATS.

[45]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[46]  Ning Chen,et al.  Infinite Latent SVM for Classification and Multi-task Learning , 2011, NIPS.

[47]  Peter V. Gehler,et al.  Learning Output Kernels with Block Coordinate Descent , 2011, ICML.

[48]  Kenji Fukumizu,et al.  Learning low-rank output kernels , 2011, ACML.

[49]  Onno Zoeter,et al.  Sparse Bayesian Multi-Task Learning , 2011, NIPS.

[50]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[51]  Jacques Wainer,et al.  Flexible Modeling of Latent Task Structures in Multitask Learning , 2012, ICML.

[52]  Lisa Turner,et al.  Applications of Second Order Cone Programming , 2012 .

[53]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[54]  Dit-Yan Yeung,et al.  Transfer Metric Learning with Semi-Supervised Extension , 2012, TIST.

[55]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[56]  Massimiliano Pontil,et al.  Exploiting Unrelated Tasks in Multi-Task Learning , 2012, AISTATS.

[57]  Yu Zhang Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms , 2013, NIPS.

[58]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[59]  K. Johana,et al.  Benchmarking Least Squares Support Vector Machine Classifiers , 2022 .