More efficiency in multiple kernel learning

An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al. (2006). This approach has opened new perspectives since it makes the MKL approach tractable for large-scale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs several iterations before converging towards a reasonable solution. In this paper, we address the MKL problem through an adaptive 2-norm regularization formulation. Weights on each kernel matrix are included in the standard SVM empirical risk minimization problem with a l1 constraint to encourage sparsity. We propose an algorithm for solving this problem and provide an new insight on MKL algorithms based on block 1-norm regularization by showing that the two approaches are equivalent. Experimental results show that the resulting algorithm converges rapidly and its efficiency compares favorably to other MKL algorithms.

[1]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[2]  Claude Lemaréchal,et al.  Practical Aspects of the Moreau-Yosida Regularization: Theoretical Preliminaries , 1997, SIAM J. Optim..

[3]  Alexander Shapiro,et al.  Optimization Problems with Perturbations: A Guided Tour , 1998, SIAM Rev..

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  Yves Grandvalet Least Absolute Shrinkage is Equivalent to Quadratic Penalization , 1998 .

[6]  Kiri Wagstaff,et al.  Alpha seeding for support vector machines , 2000, KDD '00.

[7]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[8]  Jean Charles Gilbert,et al.  Numerical Optimization: Theoretical and Practical Aspects , 2003 .

[9]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[10]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[11]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[12]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[13]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[14]  J. Frédéric Bonnans,et al.  Numerical Optimization: Theoretical and Practical Aspects (Universitext) , 2006 .

[15]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[16]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.