Reduced Set KPCA for Improving the Training and Execution Speed of Kernel Machines

This paper presents a practical, and theoretically wellfounded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-toimplement method to remove or replace samples with minimal effect on the empirical operator. A simple data point selection procedure is given to generate a substitute density for the data, with accuracy that is governed by a user-tunable parameter `. The effect of the approximation on the quality of the KPCA solution, in terms of spectral and operator errors, can be shown directly in terms of the density estimate error and as a function of the parameter `. We show in experiments that RSKPCA can improve both training and evaluation time of KPCA by up to an order of magnitude, and compares favorably to the widely-used Nystrom and density-weighted Nystrom methods.

[1]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[2]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[3]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[4]  Kai Zhang,et al.  Density-Weighted Nyström Method for Computing Large Kernel Eigensystems , 2009, Neural Comput..

[5]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[6]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[7]  Peter Tiño,et al.  Fast parzen window density estimator , 2009, 2009 International Joint Conference on Neural Networks.

[8]  Daniel Freedman,et al.  KDE Paring and a Faster Mean Shift Algorithm , 2010, SIAM J. Imaging Sci..

[9]  Mikhail Belkin,et al.  On Learning with Integral Operators , 2010, J. Mach. Learn. Res..

[10]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[11]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[12]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[13]  Larry Wasserman,et al.  All of Statistics , 2004 .

[14]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[15]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[16]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[17]  Michael E. Tipping Sparse Kernel Principal Component Analysis , 2000, NIPS.

[18]  Bernhard Schölkopf,et al.  Sampling Techniques for Kernel Methods , 2001, NIPS.

[19]  Patricio A. Vela,et al.  Kernel Map Compression for Speeding the Execution of Kernel-Based Methods , 2011, IEEE Transactions on Neural Networks.

[20]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[21]  Gilles Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.