A support vector machine formulation to PCA analysis and its kernel version

In this paper, we present a simple and straightforward primal-dual support vector machine formulation to the problem of principal component analysis (PCA) in dual variables. By considering a mapping to a high-dimensional feature space and application of the kernel trick (Mercer theorem), kernel PCA is obtained as introduced by Scholkopf et al. (2002). While least squares support vector machine classifiers have a natural link with the kernel Fisher discriminant analysis (minimizing the within class scatter around targets +1 and -1), for PCA analysis one can take the interpretation of a one-class modeling problem with zero target value around which one maximizes the variance. The score variables are interpreted as error variables within the problem formulation. In this way primal-dual constrained optimization problem interpretations to the linear and kernel PCA analysis are obtained in a similar style as for least square-support vector machine classifiers.

[1]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[2]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[5]  Johan A. K. Suykens,et al.  Bayesian Framework for Least-Squares Support Vector Machine Classifiers, Gaussian Processes, and Kernel Fisher Discriminant Analysis , 2002, Neural Computation.

[6]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[7]  Mark Girolami,et al.  Orthogonal Series Density Estimation and the Kernel Eigenvalue Problem , 2002, Neural Computation.

[8]  Harold Hotelling,et al.  Simplified calculation of principal components , 1936 .

[9]  Johan A. K. Suykens,et al.  Optimal control by least squares support vector machines , 2001, Neural Networks.

[10]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[11]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[12]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[13]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[14]  G. Wahba Spline models for observational data , 1990 .

[15]  Gene H. Golub,et al.  Matrix computations , 1983 .

[16]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[17]  Johan A. K. Suykens,et al.  Weighted least squares support vector machines: robustness and sparse approximation , 2002, Neurocomputing.

[18]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[19]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[20]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[21]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[22]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[23]  Sun-Yuan Kung,et al.  Principal Component Neural Networks: Theory and Applications , 1996 .

[24]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[25]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .