Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces

Abstract In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms for the studied algorithms, considering a capacity assumption on the hypothesis space and a general source condition on the target function. Consequently, we obtain almost sure convergence results with optimal rates. Our results improve and generalize previous results, filling a theoretical gap for the non-attainable cases.

[1]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[2]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: Index , 2007 .

[3]  J. Fujii,et al.  Norm inequalities equivalent to Heinz inequality , 1993 .

[4]  Abhishake Rastogi,et al.  Optimal Rates for the Regularized Learning Algorithms under General Source Condition , 2016, Front. Appl. Math. Stat..

[5]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[6]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[7]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[8]  Yiming Ying,et al.  Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..

[9]  Ding-Xuan Zhou,et al.  Distributed Learning with Regularized Least Squares , 2016, J. Mach. Learn. Res..

[10]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[11]  Michael Solomyak,et al.  Double Operator Integrals in a Hilbert Space , 2003 .

[12]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.

[13]  Peter Mathé,et al.  Regularization of some linear ill-posed problems with discretized random noisy data , 2006, Math. Comput..

[14]  Barnabás Póczos,et al.  Two-stage sampled learning theory on distributions , 2015, AISTATS.

[15]  Lorenzo Rosasco,et al.  Learning with Incremental Iterative Regularization , 2014, NIPS.

[16]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[17]  Lorenzo Rosasco,et al.  On the Sample Complexity of Subspace Learning , 2013, NIPS.

[18]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[19]  S. Pereverzyev,et al.  Regularized Nyström subsampling in regression and ranking problems under general smoothness assumptions , 2019, Analysis and Applications.

[20]  A. Caponnetto Optimal Rates for Regularization Operators in Learning Theory , 2006 .

[21]  P. Mathé,et al.  MODULI OF CONTINUITY FOR OPERATOR VALUED FUNCTIONS , 2002 .

[22]  Lorenzo Rosasco,et al.  Spectral Algorithms for Supervised Learning , 2008, Neural Computation.

[23]  Volkan Cevher,et al.  Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral-Regularization Algorithms , 2018, ArXiv.

[24]  Tong Zhang,et al.  Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[25]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[26]  Volkan Cevher,et al.  Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms , 2018, J. Mach. Learn. Res..

[27]  L. Rosasco,et al.  Less is More: Nystr\"om Computational Regularization , 2015 .

[28]  Lorenzo Rosasco,et al.  Optimal Rates for Multi-pass Stochastic Gradient Methods , 2016, J. Mach. Learn. Res..

[29]  Lorenzo Rosasco,et al.  Less is More: Nyström Computational Regularization , 2015, NIPS.

[30]  Gilles Blanchard,et al.  Optimal Rates for Regularization of Statistical Inverse Learning Problems , 2016, Found. Comput. Math..

[31]  Ding-Xuan Zhou,et al.  Learning theory of distributed spectral algorithms , 2017 .

[32]  I. Pinelis,et al.  Remarks on Inequalities for Large Deviation Probabilities , 1986 .

[33]  Daniel J. Hsu,et al.  Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators , 2017 .

[34]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.