Error analysis for online gradient descent algorithms in reproducing kernel Hilbert spaces

We consider online gradient descent algorithms with general convex loss functions in reproducing kernel Hilbert spaces (RKHS). These algorithms offer an advantageous way for learning from large training sets. We provide general conditions ensuring convergence of the algorithm in the RKHS norm. Explicit generalization error rates for q-norm ε-insensitive regression loss are given by choosing the step sizes and the regularization parameter appropriately.

[1]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[2]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[3]  Philip D. Plowright,et al.  Convexity , 2019, Optimization for Chemical and Biochemical Engineering.

[4]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[5]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[6]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[7]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[8]  Yuan Yao,et al.  Online Learning Algorithms , 2006, Found. Comput. Math..

[9]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[10]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[11]  Lorenzo Rosasco,et al.  Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..

[12]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[15]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[16]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[17]  S. Smale,et al.  Shannon sampling and function reconstruction from point values , 2004 .

[18]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[19]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[20]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[21]  C. McDiarmid Concentration , 1862, The Dental register.

[22]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .