Infinite Kernel Learning

In this paper we build upon the Multiple Kernel Learning (MKL) framework and in particular on [2] which generalized it to infinitely many kernels . We rewrite the problem in the standard MKL formulation which leads to a Semi-Infinite Program. We devise a new algorithm to solve it (Infinite Kernel Learning, IKL). The IKL algorithm is applicable to both the finite and infinite case and we find it to be faster and more stable than SimpleMKL [8]. Furthermore we present the first large scale comparison of SVMs to MKL on a variety of benchmark datasets, also comparing IKL. The results show two things: a) for many datasets there is no benefit in using MKL/IKL instead of the SVM classifier, thus the flexibility of using more than one kernel seems to be of no use, b) on some datasets IKL yields massive increases in accuracy over SVM/MKL due to the possibility of using a largely increased kernel set. For those cases parameter selection through Cross-Validation or MKL is not applicable.

[1]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[2]  Koby Crammer,et al.  Advances in Neural Information Processing Systems 14 , 2002 .

[3]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[4]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[5]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[6]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  R. Horst,et al.  DC Programming: Overview , 1999 .

[8]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[9]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[10]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[11]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[12]  R Horst,et al.  Global Optimization (3rd edition) , 1997 .

[13]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[14]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[15]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[16]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[17]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[18]  Gunnar Rätsch,et al.  Robust Boosting via Convex Optimization: Theory and Applications , 2007 .

[19]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[20]  Yoram Baram,et al.  Learning by Kernel Polarization , 2005, Neural Computation.

[21]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[22]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[23]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[24]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.