Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming

The kernel function plays a central role in kernel methods. In this paper, we consider the automated learning of the kernel matrix over a convex combination of pre-specified kernel matrices in Regularized Kernel Discriminant Analysis (RKDA), which performs lineardiscriminant analysis in the feature space via the kernel trick. Previous studies have shown that this kernel learning problem can be formulated as a semidefinite program (SDP), which is however computationally expensive, even with the recent advances in interior point methods. Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a Quadratically Constrained Quadratic Programming (QCQP) formulation for the kernel learning problem, which can be solved more efficiently than SDP. While most existing work on kernel learning deal with binary-class problems only, we show that our QCQP formulation can be extended naturally to the multi-class case. Experimental results on both binary-class and multi-class benchmarkdata sets show the efficacy of the proposed QCQP formulations.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[3]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[4]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[5]  Stephen P. Boyd,et al.  Applications of second-order cone programming , 1998 .

[6]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[7]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[8]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[10]  Murat Dundar,et al.  A fast iterative algorithm for fisher discriminant using heterogeneous kernels , 2004, ICML.

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  William Stafford Noble,et al.  Nonstationary kernel combination , 2006, ICML.

[16]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  Gunnar Rätsch,et al.  A Mathematical Programming Approach to the Kernel Fisher Algorithm , 2000, NIPS.

[19]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[22]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[23]  Sebastian Mika,et al.  Kernel Fisher Discriminants , 2003 .

[24]  Jieping Ye,et al.  Classification of Drosophila embryonic developmental stage range based on gene expression pattern images. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[25]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[26]  Jieping Ye,et al.  Discriminant kernel and regularization parameter learning via semidefinite programming , 2007, ICML '07.

[27]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[28]  Gene H. Golub,et al.  Matrix computations , 1983 .

[29]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[30]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[31]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.