论文信息 - Ultra-Fast Optimization Algorithm for Sparse Multi Kernel Learning

Ultra-Fast Optimization Algorithm for Sparse Multi Kernel Learning

Many state-of-the-art approaches for Multi Kernel Learning (MKL) struggle at finding a compromise between performance, sparsity of the solution and speed of the optimization process. In this paper we look at the MKL problem at the same time from a learning and optimization point of view. So, instead of designing a regularizer and then struggling to find an efficient method to minimize it, we design the regularizer while keeping the optimization algorithm in mind. Hence, we introduce a novel MKL formulation, which mixes elements of p-norm and elastic-net kind of regularization. We also propose a fast stochastic gradient descent method that solves the novel MKL formulation. We show theoretically and empirically that our method has 1) state-of-the-art performance on many classification tasks; 2) exact sparse solutions with a tunable level of sparsity; 3) a convergence rate bound that depends only logarithmically on the number of kernels used, and is independent of the sparsity required; 4) independence on the particular convex loss function used.

Francesco Orabona | Jie Luo | Francesco Orabona | Jie Luo

[1] Francesco Orabona,et al. OM-2: An online multi-class Multi-Kernel Learning algorithm Luo Jie , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[2] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[3] N. Cristianini,et al. On Kernel-Target Alignment , 2001, NIPS.

[4] Klaus-Robert Müller,et al. Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[5] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[6] Zenglin Xu,et al. An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[7] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[8] Sebastian Nowozin,et al. On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9] Cheng Soon Ong,et al. Multiclass multiple kernel learning , 2007, ICML '07.

[10] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[11] Mehryar Mohri,et al. Two-Stage Learning Kernel Algorithms , 2010, ICML.

[12] S. V. N. Vishwanathan,et al. Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[13] Sham M. Kakade,et al. Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.

[14] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[16] Barbara Caputo,et al. Online-batch strongly convex Multi Kernel Learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[18] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[19] Eric P. Xing,et al. Online Learning of Structured Predictors with Multiple Kernels , 2011, AISTATS.

[20] Ambuj Tewari,et al. Applications of strong convexity--strong smoothness duality to learning with matrices , 2009, ArXiv.

[21] S. Kakade,et al. On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[22] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[23] Ryota Tomioka,et al. Sparsity-accuracy trade-off in MKL , 2010, 1001.2615.

[24] Andrew Zisserman,et al. A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25] K. R. Ramakrishnan,et al. On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation , 2009, NIPS.

[26] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[27] Olivier Chapelle,et al. Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[28] Gunnar Rätsch,et al. Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..