Nonlinear kernel-based statistical pattern analysis

The eigenstructure of the second-order statistics of a multivariate random population can be inferred from the matrix of pairwise combinations of inner products of the samples. Therefore, it can be also efficiently obtained in the implicit, high-dimensional feature spaces defined by kernel functions. We elaborate on this property to obtain general expressions for immediate derivation of nonlinear counterparts of a number of standard pattern analysis algorithms, including principal component analysis, data compression and denoising, and Fisher's discriminant. The connection between kernel methods and nonparametric density estimation is also illustrated. Using these results we introduce the kernel version of Mahalanobis distance, which originates nonparametric models with unexpected and interesting properties, and also propose a kernel version of the minimum squared error (MSE) linear discriminant function. This learning machine is particularly simple and includes a number of generalized linear models such as the potential functions method or the radial basis function (RBF) network. Our results shed some light on the relative merit of feature spaces and inductive bias in the remarkable generalization properties of the support vector machine (SVM). Although in most situations the SVM obtains the lowest error rates, exhaustive experiments with synthetic and natural data show that simple kernel machines based on pseudoinversion are competitive in problems with appreciable class overlapping.

[1]  A. Izenman Recent Developments in Nonparametric Density Estimation , 1991 .

[2]  M. Hulle Kernel-Based Equiprobabilistic Topographic Map Formation , 1998, Neural Computation.

[3]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[4]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[5]  Tomaso A. Poggio,et al.  A Sparse Representation for Function Approximation , 1998, Neural Computation.

[6]  Nils J. Nilsson,et al.  The Mathematical Foundations of Learning Machines , 1990 .

[7]  Massimiliano Pontil,et al.  Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Bernhard Schölkopf,et al.  On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[11]  S. Vavasis Nonlinear optimization: complexity issues , 1991 .

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[14]  Federico Girosi,et al.  Reducing the run-time complexity of Support Vector Machines , 1999 .

[15]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[16]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[17]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[18]  John Mark,et al.  Introduction to radial basis function networks , 1996 .

[19]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[20]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  Jack J. Dongarra,et al.  Software Libraries for Linear Algebra Computations on High Performance Computers , 1995, SIAM Rev..

[22]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[23]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[24]  Dick den Hertog,et al.  Interior Point Approach to Linear, Quadratic and Convex Programming: Algorithms and Complexity , 1994 .

[25]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[26]  Nello Cristianini,et al.  The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines , 1998, ICML.

[27]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[29]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[30]  James Stephen Marron,et al.  A Comparison of Cross-Validation Techniques in Density Estimation , 1987 .

[31]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[32]  Bernhard Schölkopf,et al.  Shrinking the Tube: A New Support Vector Regression Algorithm , 1998, NIPS.

[33]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[34]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[35]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[36]  Bernhard Schölkopf,et al.  Semiparametric Support Vector and Linear Programming Machines , 1998, NIPS.

[37]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[38]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[39]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[40]  R. Vanderbei LOQO:an interior point code for quadratic programming , 1999 .

[41]  Massimiliano Pontil,et al.  Properties of Support Vector Machines , 1998, Neural Computation.

[42]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .