Journal of Machine Learning Research X (2008) 1-34 Submitted 01/08; Revised 08/08; Published XX/XX

Multiple kernel learning (MKL) aims at simultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector machine, an efficient and general multiple kernel learning algorithm, based on semi-infinite linear progamming, has been recently proposed. This approach has opened new perspectives since it makes MKL tractable for large-scale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs numerous iterations for converging towards a reasonable solution. In this paper, we address the MKL problem through a weighted 2-norm regularization formulation with an additional constraint on the weights that encourages sparse kernel combinations. Apart from learning the combination, we solve a standard SVM optimization problem, where the kernel is defined as a linear combination of multiple kernels. We propose an algorithm, named SimpleMKL, for solving this MKL problem and provide a new insight on MKL algorithms based on mixed-norm regularization by showing that the two approaches are equivalent. We show how SimpleMKL can be applied beyond binary classification, for problems like regression, clustering (one-class classification) or multiclass classification. Experimental results show that the proposed algorithm converges rapidly and that its efficiency compares favorably to other MKL algorithms. Finally, we illustrate the usefulness of MKL for some regressors based on wavelet kernels and on some model selection problems related to multiclass classification problems. c ©2008 Rakotomamonjy et al.. Rakotomamonjy et al.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  G. Wahba Spline models for observational data , 1990 .

[3]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[4]  Claude Lemaréchal,et al.  Practical Aspects of the Moreau-Yosida Regularization: Theoretical Preliminaries , 1997, SIAM J. Optim..

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Yves Grandvalet,et al.  Outcomes of the Equivalence of Adaptive Ridge with Least Absolute Shrinkage , 1998, NIPS.

[7]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Yves Grandvalet Least Absolute Shrinkage is Equivalent to Quadratic Penalization , 1998 .

[10]  Kiri Wagstaff,et al.  Alpha seeding for support vector machines , 2000, KDD '00.

[11]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[12]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[13]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[14]  Jean Charles Gilbert,et al.  Numerical Optimization: Theoretical and Practical Aspects , 2003 .

[15]  Michael I. Jordan,et al.  Computing regularization paths for learning multiple kernels , 2004, NIPS.

[16]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[17]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[18]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[19]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[20]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[21]  Saharon Rosset,et al.  Tracking Curved Regularized Optimization Solution Paths , 2004, NIPS 2004.

[22]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[23]  Murat Dundar,et al.  A fast iterative algorithm for fisher discriminant using heterogeneous kernels , 2004, ICML.

[24]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[25]  Stéphane Canu,et al.  Frames, Reproducing Kernels, Regularization and Learning , 2005, J. Mach. Learn. Res..

[26]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[27]  Alexander J. Smola,et al.  Boîte à outils SVM simple et rapide , 2005, Rev. d'Intelligence Artif..

[28]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[29]  S. Canu,et al.  Non‐parametric regression with wavelet kernels , 2005 .

[30]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[31]  Anestis Antoniadis,et al.  Wavelet kernel penalized estimation for non-equispaced design regression , 2006, Stat. Comput..

[32]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[33]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[34]  Zaïd Harchaoui,et al.  Image Classification with Segmentation Graph Kernels , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Stéphane Canu,et al.  Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets" , 2007, J. Mach. Learn. Res..

[36]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[37]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[38]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[39]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[40]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[41]  Alexandre d'Aspremont,et al.  Smooth Optimization with Approximate Gradient , 2005, SIAM J. Optim..

[42]  Shigeo Abe,et al.  Multiclass Support Vector Machines , 2010 .

[43]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .