Kernel Partial Least Squares is Universally Consistent

We prove the statistical consistency of kernel Partial Least Squares Regression applied to a bounded regression learning problem on a reproducing kernel Hilbert space. Partial Least Squares stands out of well-known classical approaches as e.g. Ridge Regression or Principal Components Regression, as it is not defined as the solution of a global cost minimization procedure over a fixed model nor is it a linear estimator. Instead, approximate solutions are constructed by projections onto a nested set of data-dependent subspaces. To prove consistency, we exploit the known fact that Partial Least Squares is equivalent to the conjugate gradient algorithm in combination with early stopping. The choice of the stopping rule (number of iterations) is a crucial point. We study two empirical stopping rules. The first one monitors the estimation error in each iteration step of Partial Least Squares, and the second one estimates the empirical complexity in terms of a condition number. Both stopping rules lead to universally consistent estimators provided the kernel is universal.

[1]  Nicolai Bissantz,et al.  Convergence Rates of General Regularization Methods for Statistical Inverse Problems and Applications , 2007, SIAM J. Numer. Anal..

[2]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[3]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[4]  H. Wold Path Models with Latent Variables: The NIPALS Approach , 1975 .

[5]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[6]  Fumikazu Miwakeichi,et al.  Concurrent EEG/fMRI analysis by multiway Partial Least Squares , 2004, NeuroImage.

[7]  Nicole Krämer,et al.  Kernelizing PLS, degrees of freedom, and efficient model selection , 2007, ICML '07.

[8]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[9]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[10]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[11]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[12]  N. Cristianini,et al.  Estimating the moments of a random vector with applications , 2003 .

[13]  A. Phatak,et al.  Exploiting the connection between PLS, Lanczos methods and conjugate gradients: alternative proofs of some properties of PLS , 2002 .

[14]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[15]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[16]  Prasad A. Naik,et al.  Partial least squares estimator for single‐index models , 2000 .

[17]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[18]  Gilles Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.

[19]  M. Hanke Conjugate gradient type methods for ill-posed problems , 1995 .

[20]  Roman Rosipal,et al.  Kernel PLS-SVC for Linear and Nonlinear Classification , 2003, ICML.

[21]  S. D. Jong PLS fits closer than PCR , 1993 .

[22]  A. Caponnetto Optimal Rates for Regularization Operators in Learning Theory , 2006 .

[23]  Lorenzo Rosasco,et al.  Spectral Algorithms for Supervised Learning , 2008, Neural Computation.

[24]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[25]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[26]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..