TR-178 Infinite Kernel Learning

In this paper we consider the problem of automatically learning the kernel from general kernel classes. Specifically we build upon the Multiple Kernel Learning (MKL) framework and in particular on the work of (Argyriou, Hauser, Micchelli, & Pontil, 2006). We will formulate a Semi-Infinite Program (SIP) to solve the problem and devise a new algorithm to solve it (Infinite Kernel Learning, IKL). The IKL algorithm is applicable to both the finite and infinite case and we find it to be faster and more stable than SimpleMKL (Rakotomamonjy, Bach, Canu, & Grandvalet, 2007) for cases of many kernels. In the second part we present the first large scale comparison of SVMs to MKL on a variety of benchmark datasets, also comparing IKL. The results show two things: a) for many datasets there is no benefit in linearly combining kernels with MKL/IKL instead of the SVM classifier, thus the flexibility of using more than one kernel seems to be of no use, b) on some datasets IKL yields impressive increases in accuracy over SVM/MKL due to the possibility of using a largely increased kernel set. In those cases, IKL remains practical, whereas both cross-validation or standard MKL is infeasible.

[1]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[2]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[3]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[4]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[5]  Gunnar Rätsch,et al.  Robust Boosting via Convex Optimization: Theory and Applications , 2007 .

[6]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[7]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[8]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[9]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[10]  Yoram Baram,et al.  Learning by Kernel Polarization , 2005, Neural Computation.

[11]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[12]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[13]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[14]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[15]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[16]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[17]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[18]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[19]  R. Horst,et al.  DC Programming: Overview , 1999 .

[20]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[21]  R Horst,et al.  Global Optimization (3rd edition) , 1997 .

[22]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[23]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .