Model Selection for Regularized Least-Squares Algorithm in Learning Theory

AbstractWe investigate the problem of model selection for learning algorithms depending on a continuous parameter. We propose a model selection procedure based on a worst-case analysis and on a data-independent choice of the parameter. For the regularized least-squares algorithm we bound the generalization error of the solution by a quantity depending on a few known constants and we show that the corresponding model selection procedure reduces to solving a bias-variance problem. Under suitable smoothness conditions on the regression function, we estimate the optimal parameter as a function of the number of data and we prove that this choice ensures consistency of the algorithm.

[1]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[2]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[3]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[4]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[5]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[6]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[7]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[8]  Federico Girosi,et al.  Generalization bounds for function approximation from scattered noisy data , 1999, Adv. Comput. Math..

[9]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[10]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[11]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[12]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[13]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[14]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[15]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[16]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[17]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[20]  Tong Zhang,et al.  Leave-One-Out Bounds for Kernel Methods , 2003, Neural Computation.

[21]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[22]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[23]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[24]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[25]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .