Optimal Rates for the Regularized Least-Squares Algorithm

We develop a theoretical analysis of the performance of the regularized least-square algorithm on a reproducing kernel Hilbert space in the supervised learning setting. The presented results hold in the general framework of vector-valued functions; therefore they can be applied to multitask problems. In particular, we observe that the concept of effective dimension plays a central role in the definition of a criterion for the choice of the regularization parameter as a function of the number of samples. Moreover, a complete minimax analysis of the problem is described, showing that the convergence rates obtained by regularized least-squares estimators are indeed optimal over a suitable class of priors defined by the considered kernel. Finally, we give an improved lower rate result describing worst asymptotic behavior on individual probability measures rather than over classes of priors.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  Laurent Schwartz,et al.  Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associés (Noyaux reproduisants) , 1964 .

[3]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[4]  J. Burbea,et al.  Banach and Hilbert spaces of vector-valued functions: Their general theory and applications to holomorphy , 1984 .

[5]  C. W. Groetsch,et al.  The theory of Tikhonov regularization for Fredholm equations of the first kind , 1984 .

[6]  I. Pinelis,et al.  Remarks on Inequalities for Large Deviation Probabilities , 1986 .

[7]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .

[8]  G. Wahba Spline models for observational data , 1990 .

[9]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[10]  V. Yurinsky Sums and Gaussian Vectors , 1995 .

[11]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[12]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[13]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[16]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[17]  Adam Krzyzak,et al.  Nonparametric regression estimation using penalized least squares , 2001, IEEE Trans. Inf. Theory.

[18]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[19]  Bernhard Schölkopf,et al.  Kernel Dependency Estimation , 2002, NIPS.

[20]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[21]  Tong Zhang,et al.  Effective Dimension and Generalization of Kernel Learning , 2002, NIPS.

[22]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[23]  V. Temlyakov Nonlinear Methods of Approximation , 2003, Found. Comput. Math..

[24]  Shahar Mendelson,et al.  On the Performance of Kernel Classes , 2003, J. Mach. Learn. Res..

[25]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[26]  Tong Zhang,et al.  Leave-One-Out Bounds for Kernel Methods , 2003, Neural Computation.

[27]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[28]  S. Smale,et al.  Shannon sampling II: Connections to learning theory , 2005 .

[29]  Lorenzo Rosasco,et al.  Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..

[30]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[31]  C. Carmeli,et al.  Reproducing kernel Hilbert spaces and Mercer theorem , 2005, math/0504071.

[32]  R. DeVore,et al.  Mathematical Methods for Supervised Learning , 2005 .

[33]  E. D. Vito,et al.  Risk Bounds for Regularized Least-squares Algorithm with Operator-valued kernels , 2005 .

[34]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[35]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[36]  C. Carmeli,et al.  VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[37]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[38]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[39]  V. Temlyakov Approximation in Learning Theory , 2008 .