Non-sparse Multiple Kernel Learning for Fisher Discriminant Analysis

We consider the problem of learning a linear combination of pre-specified kernel matrices in the Fisher discriminant analysis setting. Existing methods for such a task impose an $\ell_1$ norm regularisation on the kernel weights, which produces sparse solution but may lead to loss of information. In this paper, we propose to use $\ell_2$ norm regularisation instead. The resulting learning problem is formulated as a semi-infinite program and can be solved efficiently. Through experiments on both synthetic data and a very challenging object recognition benchmark, the relative advantages of the proposed method and its $\ell_1$ counterpart are demonstrated, and insights are gained as to how the choice of regularisation norm should be made.

[1]  Jieping Ye,et al.  Multi-class Discriminant Kernel Learning via Convex Programming , 2008, J. Mach. Learn. Res..

[2]  Sebastian Mika,et al.  Kernel Fisher Discriminants , 2003 .

[3]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[4]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[5]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[6]  Marco A. López,et al.  Semi-infinite programming , 2007, Eur. J. Oper. Res..

[8]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[9]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[10]  G. Rätsch Robust Boosting via Convex Optimization , 2001 .

[11]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[12]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[14]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[15]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[16]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[17]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[19]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[22]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[23]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[24]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[25]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.