A Convex Feature Learning Formulation for Latent Task Structure Discovery

This paper considers the multi-task learning problem and in the setting where some relevant features could be shared across few related tasks. Most of the existing methods assume the extent to which the given tasks are related or share a common feature space to be known apriori. In real-world applications however, it is desirable to automatically discover the groups of related tasks that share a feature space. In this paper we aim at searching the exponentially large space of all possible groups of tasks that may share a feature space. The main contribution is a convex formulation that employs a graph-based regularizer and simultaneously discovers few groups of related tasks, having close-by task parameters, as well as the feature space shared within each group. The regularizer encodes an important structure among the groups of tasks leading to an efficient algorithm for solving it: if there is no feature space under which a group of tasks has close-by task parameters, then there does not exist such a feature space for any of its supersets. An efficient active set algorithm that exploits this simplification and performs a clever search in the exponentially large space is presented. The algorithm is guaranteed to solve the proposed formulation (within some precision) in a time polynomial in the number of groups of related tasks discovered. Empirical results on benchmark datasets show that the proposed formulation achieves good generalization and outperforms state-of-the-art multi-task learning algorithms in some cases.

[1]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[2]  Francis R. Bach,et al.  High-Dimensional Non-Linear Variable Selection through Hierarchical Kernel Learning , 2009, ArXiv.

[3]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[4]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[5]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML.

[6]  Albrecht Irle,et al.  A general minimax theorem , 1985, Z. Oper. Research.

[7]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[8]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[9]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[10]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[13]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[14]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15]  穂鷹 良介 Non-Linear Programming の計算法について , 1963 .

[16]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[17]  Dit-Yan Yeung,et al.  Semi-Supervised Multi-Task Regression , 2009, ECML/PKDD.

[18]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[19]  J. Tropp Algorithms for simultaneous sparse approximation. Part II: Convex relaxation , 2006, Signal Process..

[20]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[21]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[22]  Gunnar Rätsch,et al.  Inferring latent task structure for Multitask Learning by Multiple Kernel Learning , 2010, BMC Bioinformatics.

[23]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[24]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[25]  Xi Chen,et al.  Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso , 2010, ArXiv.

[26]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[27]  Martin J. Wainwright,et al.  Phase transitions for high-dimensional joint support recovery , 2008, NIPS.

[28]  M. Sion On general minimax theorems , 1958 .

[29]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[30]  Alexander J. Smola,et al.  Learning with kernels , 1998 .