A Geometric Algorithm for Scalable Multiple Kernel Learning

We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. To do so, we reinterpret the problem of learning kernel weights as searching for a kernel that maximizes the minimum (kernel) distance between two convex polytopes. This interpretation combined with novel structural insights from our geometric formulation allows us to reduce the MKL problem to a simple optimization routine that yields provable convergence as well as quality guarantees. As a result our method scales efficiently to much larger data sets than most prior methods can handle. Empirical evaluation on eleven datasets shows that we are significantly faster and even compare favorably with a uniform unweighted combination of kernels.

[1]  S. V. N. Vishwanathan,et al.  Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[2]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[3]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[4]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[5]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[6]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[7]  Jieping Ye,et al.  Discriminant kernel and regularization parameter learning via semidefinite programming , 2007, ICML '07.

[8]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[9]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[10]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[11]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[12]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[13]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[14]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[15]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[16]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[17]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[20]  Satyen Kale Efficient algorithms using the multiplicative weights update method , 2007 .

[21]  Francesco Orabona,et al.  Ultra-Fast Optimization Algorithm for Sparse Multi Kernel Learning , 2011, ICML.

[22]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[23]  Zenglin Xu,et al.  Simple and Efficient Multiple Kernel Learning by Group Lasso , 2010, ICML.

[24]  Corinna Cortes,et al.  Invited talk: Can learning kernels help performance? , 2009, International Conference on Machine Learning.

[25]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[26]  Suresh Venkatasubramanian,et al.  A Gentle Introduction to the Kernel Distance , 2011, ArXiv.

[27]  Koray Kavukcuoglu,et al.  A Binary Classification Framework for Two-Stage Multiple Kernel Learning , 2012, ICML.

[28]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[29]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[30]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[31]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[32]  Kristin P. Bennett,et al.  Duality and Geometry in SVM Classifiers , 2000, ICML.

[33]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[34]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[35]  S. V. N. Vishwanathan,et al.  SPF-GMKL: generalized multiple kernel learning with a million kernels , 2012, KDD.

[36]  Sanjeev Arora,et al.  A combinatorial, primal-dual approach to semidefinite programs , 2007, STOC '07.

[37]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[38]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[39]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.