Efficient hyperkernel learning using second-order cone programming

The kernel function plays a central role in kernel methods. Most existing methods can only adapt the kernel parameters or the kernel matrix based on empirical data. Recently, Ong et al. introduced the method of hyperkernels which can be used to learn the kernel function directly in an inductive setting. However, the associated optimization problem is a semidefinite program (SDP), which is very computationally expensive, even with the recent advances in interior point methods. In this paper, we show that this learning problem can be equivalently reformulated as a second-order cone program (SOCP), which can then be solved more efficiently than SDPs. Comparison is also made with the kernel matrix learning method proposed by Lanckriet et al. Experimental results on both classification and regression problems, with toy and real-world data sets, show that our proposed SOCP formulation has significant speedup over the original SDP formulation. Moreover, it yields better generalization than Lanckriet et al.'s method, with a speed that is comparable, or sometimes even faster, than their quadratically constrained quadratic program (QCQP) formulation.

[1]  Kristin P. Bennett,et al.  MARK: a boosting algorithm for heterogeneous kernel models , 2002, KDD.

[2]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[3]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[4]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[5]  Erling D. Andersen,et al.  On implementing a primal-dual interior-point method for conic quadratic optimization , 2003, Math. Program..

[6]  G. Wahba Spline models for observational data , 1990 .

[7]  Cheng Soon Ong,et al.  Machine learning using hyperkernels , 2003, ICML 2003.

[8]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[9]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[10]  Ivor W. Tsang,et al.  Efficient hyperkernel learning using second-order cone programming , 2006, IEEE Trans. Neural Networks.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[13]  Kiyoshi Asai,et al.  Minimizing the Cross Validation Error to Mix Kernel Matrices of Heterogeneous Biological Data , 2004, Neural Processing Letters.

[14]  M. Omair Ahmad,et al.  Optimizing the kernel in the empirical feature space , 2005, IEEE Transactions on Neural Networks.

[15]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[16]  James T. Kwok,et al.  The evidence framework applied to support vector machines , 2000, IEEE Trans. Neural Networks Learn. Syst..

[17]  S. Sathiya Keerthi,et al.  An efficient method for computing leave-one-out error in support vector machines with Gaussian kernels , 2004, IEEE Transactions on Neural Networks.

[18]  Tong Zhang,et al.  Some Sparse Approximation Bounds for Regression Problems , 2001, International Conference on Machine Learning.

[19]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[20]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[21]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[22]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[23]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[24]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[25]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[26]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[27]  Nello Cristianini,et al.  Convex Methods for Transduction , 2003, NIPS.

[28]  Zhihua Zhang,et al.  Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm , 2004, ICML.

[29]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[30]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[31]  Hans D. Mittelmann,et al.  An independent benchmarking of SDP and SOCP solvers , 2003, Math. Program..

[32]  Donald Goldfarb,et al.  Second-order cone programming , 2003, Math. Program..

[33]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[34]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[35]  Alexander J. Smola,et al.  Hyperkernels , 2002, NIPS.

[37]  Alexander J. Smola,et al.  Machine Learning with Hyperkernels , 2003, ICML.