Pareto-Path Multitask Multiple Kernel Learning

A traditional and intuitively appealing Multitask Multiple Kernel Learning (MT-MKL) method is to optimize the sum (thus, the average) of objective functions with (partially) shared kernel function, which allows information sharing among the tasks. We point out that the obtained solution corresponds to a single point on the Pareto Front (PF) of a multiobjective optimization problem, which considers the concurrent optimization of all task objectives involved in the Multitask Learning (MTL) problem. Motivated by this last observation and arguing that the former approach is heuristic, we propose a novel support vector machine MT-MKL framework that considers an implicitly defined set of conic combinations of task objectives. We show that solving our framework produces solutions along a path on the aforementioned PF and that it subsumes the optimization of the average of objective functions as a special case. Using the algorithms we derived, we demonstrate through a series of experimental results that the framework is capable of achieving a better classification performance, when compared with other similar MTL approaches.

[1]  Zenglin Xu,et al.  Simple and Efficient Multiple Kernel Learning by Group Lasso , 2010, ICML.

[2]  Christian Widmer Multitask Multiple Kernel Learning ( MT-MKL ) , 2010 .

[3]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[4]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[5]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[6]  Chiranjib Bhattacharyya,et al.  Variable Sparsity Kernel Learning , 2011, J. Mach. Learn. Res..

[7]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[8]  Stéphane Canu,et al.  $\ell_{p}-\ell_{q}$ Penalty for Sparse Linear and Sparse Multiple Kernel Multitask Learning , 2011, IEEE Transactions on Neural Networks.

[9]  Alexander G. Gray,et al.  Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL , 2012, NIPS.

[10]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[11]  M. Kloft,et al.  On the convergence rate of l p -norm multiple kernel learning , 2012 .

[12]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[13]  Motoaki Kawanabe,et al.  Multi-task Learning via Non-sparse Multiple Kernel Learning , 2011, CAIP.

[14]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[15]  Changshui Zhang,et al.  Learning Kernels with Radiuses of Minimum Enclosing Balls , 2010, NIPS.

[16]  Stefan Schäffler Classification of Critical Stationary Points in Unconstrained Optimization , 1992, SIAM J. Optim..

[17]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  M. Sion On general minimax theorems , 1958 .

[20]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[21]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[24]  Gilles Blanchard,et al.  On the convergence rate of lp-norm multiple kernel learning , 2012, J. Mach. Learn. Res..

[25]  Gunnar Rätsch,et al.  Multi-task Multiple Kernel Learning , 2013 .

[26]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[27]  M. Kloft,et al.  Non-sparse Multiple Kernel Learning , 2008 .

[28]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[29]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[30]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[31]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[32]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[33]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.