A Family of Simple Non-Parametric Kernel Learning Algorithms

Previous studies of Non-Parametric Kernel Learning (NPKL) usually formulate the learning task as a Semi-Definite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interior-point SDP solver could be as high as O(N6.5), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed "SimpleNPKL", which can learn non-parametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closed-form solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be re-formulated as a saddle-point optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding.

[1]  Stephen P. Boyd,et al.  Least-Squares Covariance Matrix Adjustment , 2005, SIAM J. Matrix Anal. Appl..

[2]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[3]  Ivor W. Tsang,et al.  SimpleNPKL: simple non-parametric kernel learning , 2009, ICML '09.

[4]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[5]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[6]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[7]  Gábor Pataki,et al.  On the Rank of Extreme Matrices in Semidefinite Programs and the Multiplicity of Optimal Eigenvalues , 1998, Math. Oper. Res..

[8]  Stephen P. Boyd,et al.  Applications of second-order cone programming , 1998 .

[9]  Rong Jin,et al.  Active kernel learning , 2008, ICML '08.

[10]  Inderjit S. Dhillon,et al.  Learning low-rank kernel matrices , 2006, ICML.

[11]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[12]  Jieping Ye,et al.  Training SVM with indefinite kernels , 2008, ICML '08.

[13]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[14]  Sebastian Nowozin,et al.  Infinite Kernel Learning , 2008, NIPS 2008.

[15]  Tong Zhang,et al.  Graph-Based Semi-Supervised Learning and Spectral Kernel Design , 2008, IEEE Transactions on Information Theory.

[16]  Gunnar Rätsch,et al.  A General and Efficient Multiple Kernel Learning Algorithm , 2005, NIPS.

[17]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[18]  William Stafford Noble,et al.  Nonstationary kernel combination , 2006, ICML.

[19]  Le Song,et al.  Colored Maximum Variance Unfolding , 2007, NIPS.

[20]  Ivor W. Tsang,et al.  Domain Transfer SVM for video concept detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Tony Jebara,et al.  B-Matching for Spectral Clustering , 2006, ECML.

[22]  Raquel Reis,et al.  Applications of Second Order Cone Programming , 2013 .

[23]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[24]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[25]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[27]  Zaïd Harchaoui,et al.  DIFFRAC: a discriminative and flexible framework for clustering , 2007, NIPS.

[28]  Michael L. Overton,et al.  Complementarity and nondegeneracy in semidefinite programming , 1997, Math. Program..

[29]  Cristian Sminchisescu,et al.  Kernel Learning by Unconstrained Optimization , 2009, AISTATS.

[30]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[31]  Rong Jin,et al.  Learning nonparametric kernel matrices from pairwise constraints , 2007, ICML '07.

[32]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[34]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[35]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[36]  Steven C. H. Hoi,et al.  Non-parametric kernel ranking approach for social image retrieval , 2010, CIVR '10.

[37]  Charles A. Micchelli,et al.  Learning Convex Combinations of Continuously Parameterized Basic Kernels , 2005, COLT.

[38]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[39]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[40]  Ivor W. Tsang,et al.  Parameter-Free Spectral Kernel Learning , 2010, UAI.

[41]  Tony Jebara,et al.  Structure preserving embedding , 2009, ICML '09.

[42]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[43]  Tong Zhang,et al.  Analysis of Spectral Kernel Design based Semi-supervised Learning , 2005, NIPS.

[44]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[45]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[46]  Fei Wang,et al.  Graph-based semi-supervised learning , 2009, Artificial Life and Robotics.

[47]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[48]  Alexandre d'Aspremont,et al.  Support vector machine classification with indefinite kernels , 2007, Math. Program. Comput..

[49]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[50]  Colin Campbell,et al.  Analysis of SVM with Indefinite Kernels , 2009, NIPS.

[51]  Alexander Shapiro,et al.  Optimization Problems with Perturbations: A Guided Tour , 1998, SIAM Rev..

[52]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[53]  Edward Y. Chang,et al.  Learning the unified kernel machines for classification , 2006, KDD '06.

[54]  Tony Jebara,et al.  Minimum Volume Embedding , 2007, AISTATS.

[55]  Ivor W. Tsang,et al.  Two-Layer Multiple Kernel Learning , 2011, AISTATS.

[56]  John E. Mitchell,et al.  A unifying framework for several cutting plane methods for semidefinite programming , 2006, Optim. Methods Softw..

[57]  Ivor W. Tsang,et al.  Multiple Template Learning for Structured Prediction , 2011, ArXiv.