Multi-class Discriminant Kernel Learning via Convex Programming

Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.

[1]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[2]  Rong Jin,et al.  Learning nonparametric kernel matrices from pairwise constraints , 2007, ICML '07.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Eugene W. Myers,et al.  Comparing in situ mRNA expression patterns of drosophila embryos , 2004, RECOMB.

[5]  Jieping Ye,et al.  Classification of Drosophila embryonic developmental stage range based on gene expression pattern images. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[6]  John G. Daugman,et al.  Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[7]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[8]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[9]  Gunnar Rätsch,et al.  Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[11]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[12]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[13]  Ivor W. Tsang,et al.  Efficient hyperkernel learning using second-order cone programming , 2004, IEEE Transactions on Neural Networks.

[14]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[15]  Stephen P. Boyd,et al.  Optimal kernel selection in Kernel Fisher discriminant analysis , 2006, ICML.

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[17]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Charles A. Micchelli,et al.  Feature space perspectives for learning the kernel , 2006, Machine Learning.

[20]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[21]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[22]  Sebastian Mika,et al.  Kernel Fisher Discriminants , 2003 .

[23]  Hans Frenk,et al.  High performance optimization , 2000 .

[24]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[25]  Jieping Ye,et al.  Least squares linear discriminant analysis , 2007, ICML '07.

[26]  S. Panchanathan,et al.  BEST: a novel computational approach for comparing gene expression patterns from early stages of Drosophila melanogaster development. , 2002, Genetics.

[27]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[28]  Jieping Ye,et al.  Computational and Theoretical Analysis of Null Space and Orthogonal Linear Discriminant Analysis , 2006, J. Mach. Learn. Res..

[29]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[30]  Gene H. Golub,et al.  Matrix computations , 1983 .

[31]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[32]  William Stafford Noble,et al.  Nonstationary kernel combination , 2006, ICML.

[33]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[35]  Murat Dundar,et al.  A fast iterative algorithm for fisher discriminant using heterogeneous kernels , 2004, ICML.

[36]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[37]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[38]  Sethuraman Panchanathan,et al.  Automatic annotation techniques for gene expression images of the fruit fly embryo , 2005, Visual Communications and Image Processing.

[39]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[40]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[41]  Gunnar Rätsch,et al.  A Mathematical Programming Approach to the Kernel Fisher Algorithm , 2000, NIPS.

[42]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[43]  Gert R. G. Lanckriet,et al.  Convex Tuning of the Soft Margin Parameter , 2003 .

[44]  Amnon Shashua,et al.  On the Relationship Between the Support Vector Machine for Classification and Sparsified Fisher's Linear Discriminant , 1999, Neural Processing Letters.

[45]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[46]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[47]  Donald Goldfarb,et al.  Second-order cone programming , 2003, Math. Program..

[48]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[49]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[50]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.