Multiple Kernel Learning and the SMO Algorithm

Our objective is to train p-norm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a result, it has gained widespread acceptance and SVMs are routinely trained using SMO in diverse real world applications. Training using SMO has been a long standing goal in MKL for the very same reasons. Unfortunately, the standard MKL dual is not differentiable, and therefore can not be optimised using SMO style co-ordinate ascent. In this paper, we demonstrate that linear MKL regularised with the p-norm squared, or with certain Bregman divergences, can indeed be trained using SMO. The resulting algorithm retains both simplicity and efficiency and is significantly faster than state-of-the-art specialised p-norm MKL solvers. We show that we can train on a hundred thousand kernels in approximately seven minutes and on fifty thousand points in less than half an hour on a single core.

[1]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[2]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[3]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[4]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Arkadi Nemirovski,et al.  The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography , 2001, SIAM J. Optim..

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[9]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[10]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[11]  S. Lucidi,et al.  Decomposition Algorithm Model for Singly Linearly-Constrained Problems Subject to Lower and Upper Bounds , 2009 .

[12]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[13]  Mehryar Mohri,et al.  L2 Regularization for Learning Kernels , 2009, UAI.

[14]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[15]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[16]  Babak Hassibi,et al.  The p-norm generalization of the LMS algorithm for adaptive filtering , 2003, IEEE Transactions on Signal Processing.

[17]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[18]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[19]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[20]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 2003, Machine Learning.