An Optimization Perspective on Kernel Partial Least Squares Regression

This work provides a novel derivation based on optimization for the partial least squares (PLS) algorithm for linear regression and the kernel partial least squares (K-PLS) algorithm for nonlinear regression. This derivation makes the PLS algorithm, popularly and successfully used for chemometrics applications, more accessible to machine learning researchers. The work introduces Direct K-PLS, a novel way to kernelize PLS based on direct factorization of the kernel matrix. Computational results and discussion illustrate the relative merits of K-PLS and Direct K-PLS versus closely related kernel methods such as support vector machines and kernel ridge regression. ∗This work was supported by NSF grant number IIS-9979860. Many thanks to Roman Rosipal, Nello Cristianini, and Johan Suykens for many helpful discussions on PLS and kernel methods, Sean Ekans from Concurrent Pharmaceutical for providing molecule descriptions for the Albumin data set, Curt Breneman and N. Sukumar for generating descriptors for the Albumin data, and Tony Van Gestel for an efficient Gaussian kernel implementation algorithm. This work appears in J.A.K. Suykens, G. Horvath, S. Basu, C. Micchelli, J. Vandewalle (Eds.) Advances in Learning Theory: Methods, Models and Applications, NATO Science Series III: Computer & Systems Sciences, Volume 190, IOS Press Amsterdam, 2003,p. 227-250. 2 K.P. Bennett, M.J. Embrechts

[1]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[2]  Herman Wold,et al.  Soft modelling: The Basic Design and Some Extensions , 1982 .

[3]  A. Höskuldsson PLS regression methods , 1988 .

[4]  S. Wold Nonlinear partial least squares modelling II. Spline inner relation , 1992 .

[5]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[6]  S. D. Jong,et al.  The kernel PCA algorithms for wide data. Part I: Theory and algorithms , 1997 .

[7]  Desire L. Massart,et al.  Kernel-PCA algorithms for wide data Part II: Fast cross-validation and application in classification of NIR data , 1997 .

[8]  S. Wold,et al.  INLR, implicit non‐linear latent variable regression , 1997 .

[9]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Ilse C. F. Ipsen,et al.  THE IDEA BEHIND KRYLOV METHODS , 1998 .

[12]  E. Martin,et al.  Non-linear projection to latent structures revisited: the quadratic PLS algorithm , 1999 .

[13]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[14]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[15]  Robert H. Kewley,et al.  Data strip mining for the virtual design of pharmaceuticals with neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[16]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[17]  Bernhard Schölkopf,et al.  Regularization Networks and Support Vector Machines , 2000 .

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[20]  J A Swets,et al.  Better decisions through science. , 2000, Scientific American.

[21]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[22]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[23]  Anton Schwaighofer,et al.  Scalable Kernel Systems , 2001, ICANN.

[24]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[25]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[26]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[27]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[28]  J. Lavandera,et al.  Cheminformatic models to predict binding affinities to human serum albumin. , 2001, Journal of medicinal chemistry.

[29]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[30]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[31]  Chih-Jen Lin,et al.  A study on reduced support vector machines , 2003, IEEE Trans. Neural Networks.

[32]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[33]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[34]  Tomaso A. Poggio,et al.  Statistical Learning Theory: A Primer , 2000, International Journal of Computer Vision.