Feature vector selection and projection using kernels

This paper provides new insight into kernel methods by using data selection. The kernel trick is used to select from the data a relevant subset forming a basis in a feature space F. Thus the selected vectors de,ne a subspace in F. Then, the data is projected onto this subspace where classical algorithms are applied. We show that kernel methods like generalized discriminant analysis (Neural Comput. 12 (2000) 2385) or kernel principal component analysis (Neural Comput. 10 (1998) 1299) can be expressed more easily. Moreover, it will turn out that the size of the basis is related to the complexity of the model. Therefore, the data selection leads to a complexity control and thus to a better generalization. The approach covers a wide range of algorithms. We investigate the function approximation on real classi,cation problems and on a regression problem. c

[1]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  G. Baudat,et al.  Kernel-based methods and function approximation , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[6]  Bart Kosko,et al.  Neural networks for signal processing , 1992 .

[7]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[8]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[9]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[11]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[12]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[13]  Vladimir Vapnik Methods of Function Estimation , 2000 .

[14]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[15]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[16]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[17]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[18]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[19]  T. Poggio,et al.  On optimal nonlinear associative recall , 1975, Biological Cybernetics.

[20]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[21]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[22]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[23]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[24]  Fouad Badran,et al.  Probabilistic self-organizing map and radial basis function networks , 1998, Neurocomputing.

[25]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[26]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.