Efficient Output Kernel Learning for Multiple Tasks

The paradigm of multi-task learning is that one can achieve better generalization by learning tasks jointly and thus exploiting the similarity between the tasks rather than learning them independently of each other. While previously the relationship between tasks had to be user-defined in the form of an output kernel, recent approaches jointly learn the tasks and the output kernel. As the output kernel is a positive semidefinite matrix, the resulting optimization problems are not scalable in the number of tasks as an eigendecomposition is required in each step. Using the theory of positive semidefinite kernels we show in this paper that for a certain class of regularizers on the output kernel, the constraint of being positive semidefinite can be dropped as it is automatically satisfied for the relaxed problem. This leads to an unconstrained dual problem which can be solved efficiently. Experiments on several multi-task and multi-class data sets illustrate the efficacy of our approach in terms of computational efficiency as well as generalization performance.

[1]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[2]  Ganesh Ramakrishnan,et al.  Generalized hierarchical kernel learning , 2015, J. Mach. Learn. Res..

[3]  Bernt Schiele,et al.  Scalable Multitask Representation Learning for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  O. Bousquet,et al.  Kernels, Associated Structures and Generalizations , 2004 .

[5]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[6]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[7]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[8]  Roger A. Horn,et al.  The theory of infinitely divisible matrices and kernels , 1969 .

[9]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[10]  Lorenzo Rosasco,et al.  Convex Learning of Multiple Tasks and their Structure , 2015, ICML.

[11]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[13]  Fumio Hiai,et al.  Monotonicity for entrywise functions of matrices , 2007, 0709.1235.

[14]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[15]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[16]  J. Hiriart-Urruty,et al.  Fundamentals of Convex Analysis , 2004 .

[17]  Dit-Yan Yeung,et al.  Semi-Supervised Multi-Task Regression , 2009, ECML/PKDD.

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  Pratik Jawanpuria,et al.  A Convex Feature Learning Formulation for Latent Task Structure Discovery , 2012, ICML.

[20]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[21]  Peter V. Gehler,et al.  Learning Output Kernels with Block Coordinate Descent , 2011, ICML.

[22]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[23]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[24]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[25]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[26]  Jon C. Dattorro,et al.  Convex Optimization & Euclidean Distance Geometry , 2004 .

[27]  Adi Ben-Israel,et al.  What is invexity? , 1986, The Journal of the Australian Mathematical Society. Series B. Applied Mathematics.

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[30]  Gunnar Rätsch,et al.  Multi-task Multiple Kernel Learning , 2013 .

[31]  Jorma Laaksonen,et al.  Convolutional Network Features for Scene Recognition , 2014, ACM Multimedia.

[32]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[33]  Charles A. Micchelli,et al.  Universal Multi-Task Kernels , 2008, J. Mach. Learn. Res..

[34]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.