Learning subspace kernels for classification

Kernel methods have been applied successfully in many data mining tasks. Subspace kernel learning was recently proposed to discover an effective low-dimensional subspace of a kernel feature space for improved classification. In this paper, we propose to construct a subspace kernel using the Hilbert-Schmidt Independence Criterion (HSIC). We show that the optimal subspace kernel can be obtained efficiently by solving an eigenvalue problem. One limitation of the existing subspace kernel learning formulations is that the kernel learning and classification are independent and the subspace kernel may not be optimally adapted for classification. To overcome this limitation, we propose a joint optimization framework, in which we learn the subspace kernel and subsequent classifiers simultaneously. In addition, we propose a novel learning formulation that extracts an uncorrelated subspace kernel to reduce the redundant information in a subspace kernel. Following the idea from multiple kernel learning, we extend the proposed formulations to the case when multiple kernels are available and need to be combined. We show that the integration of subspace kernels can be formulated as a semidefinite program (SDP) which is computationally expensive. To improve the efficiency of the SDP formulation, we propose an equivalent semi-infinite linear program (SILP) formulation which can be solved efficiently by the column generation technique. Experimental results on a collection of benchmark data sets demonstrate the effectiveness of the proposed algorithms.

[1]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[2]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[3]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[4]  Gunnar Rätsch,et al.  A Mathematical Programming Approach to the Kernel Fisher Algorithm , 2000, NIPS.

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[7]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[8]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[9]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[10]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[11]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[12]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[13]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[14]  Jieping Ye,et al.  Efficient Kernel Discriminant Analysis via QR Decomposition , 2004, NIPS.

[15]  Haesun Park,et al.  Nonlinear feature extraction based on centroids and kernel functions , 2004, Pattern Recognit..

[16]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[19]  Thomas Hofmann,et al.  Predicting Structured Data (Neural Information Processing) , 2007 .

[20]  Mingrui Wu,et al.  A Subspace Kernel for Nonlinear Feature Extraction , 2007, IJCAI.

[21]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[22]  Nello Cristianini,et al.  Introduction To Computational Genomics , 2007 .

[23]  Nello Cristianini,et al.  Introduction to computational genomics - a case studies approach , 2007 .

[24]  Arkadi Nemirovski,et al.  EFFICIENT METHODS IN CONVEX PROGRAMMING , 2007 .

[25]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[26]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[27]  Jing Li,et al.  Heterogeneous data fusion for alzheimer's disease study , 2008, KDD.