Denoising and Dimension Reduction in Feature Space

We show that the relevant information about a classification problem in feature space is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data to perform classification. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classification results.

[1]  Mikio L. Braun,et al.  Accurate Error Bounds for the Eigenvalues of the Kernel Matrix , 2006, J. Mach. Learn. Res..

[2]  Tong Zhang,et al.  Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[3]  Mathematisch-Naturwissenschaftlichen Fakult,et al.  Spectral Properties of the Kernel Matrix and their Relation to Kernel Methods in Machine Learning , 2005 .

[4]  V. Koltchinskii,et al.  Random matrix approximation of spectra of integral operators , 2000 .

[5]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[6]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[7]  G. Wahba Spline models for observational data , 1990 .

[8]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[9]  V. Koltchinskii Asymptotics of Spectral Projections of Some Random Matrices Approximating Integral Operators , 1998 .

[10]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[13]  Gilles Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.

[14]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.