On regularization algorithms in learning theory

In this paper we discuss a relation between Learning Theory and Regularization of linear ill-posed inverse problems. It is well known that Tikhonov regularization can be profitably used in the context of supervised learning, where it usually goes under the name of regularized least-squares algorithm. Moreover, the gradient descent algorithm was studied recently, which is an analog of Landweber regularization scheme. In this paper we show that a notion of regularization defined according to what is usually done for ill-posed inverse problems allows to derive learning algorithms which are consistent and provide a fast convergence rate. It turns out that for priors expressed in term of variable Hilbert scales in reproducing kernel Hilbert spaces our results for Tikhonov regularization match those in Smale and Zhou [Learning theory estimates via integral operators and their approximations, submitted for publication, retrievable at , 2005] and improve the results for Landweber iterations obtained in Yao et al. [On early stopping in gradient descent learning, Constructive Approximation (2005), submitted for publication]. The remarkable fact is that our analysis shows that the same properties are shared by a large class of learning algorithms which are essentially all the linear regularization schemes. The concept of operator monotone functions turns out to be an important tool for the analysis.

[1]  A. Verri,et al.  Spectral Methods for Regularization in Learning Theory , 2006 .

[2]  F. Hansen,et al.  Operator Inequalities Associated with Jensen’s Inequality , 2000 .

[3]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[4]  S. Smale,et al.  Shannon sampling II: Connections to learning theory , 2005 .

[5]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[6]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[7]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[8]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[9]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[10]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[11]  W. Lockau,et al.  Contents , 2015 .

[12]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[13]  I. Pinelis,et al.  Remarks on Inequalities for Large Deviation Probabilities , 1986 .

[14]  Nicolai Bissantz,et al.  Convergence Rates of General Regularization Methods for Statistical Inverse Problems and Applications , 2007, SIAM J. Numer. Anal..

[15]  A. Verri,et al.  Empirical Effective Dimension and Optimal Rates for Regularized Least Squares Algorithm , 2005 .

[16]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[17]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[18]  C. Carmeli,et al.  Reproducing kernel Hilbert spaces and Mercer theorem , 2005, math/0504071.

[19]  E. D. Vito,et al.  Fast Rates for Regularized Least-squares Algorithm , 2005 .

[20]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[21]  E. D. Vito,et al.  DISCRETIZATION ERROR ANALYSIS FOR TIKHONOV REGULARIZATION , 2006 .

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Laurent Schwartz,et al.  Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associés (Noyaux reproduisants) , 1964 .

[24]  Peter Mathé,et al.  Regularization of some linear ill-posed problems with discretized random noisy data , 2006, Math. Comput..

[25]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[26]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[27]  Michael Solomyak,et al.  Double Operator Integrals in a Hilbert Space , 2003 .

[28]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[29]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[30]  P. Mathé,et al.  MODULI OF CONTINUITY FOR OPERATOR VALUED FUNCTIONS , 2002 .

[31]  P. Mathé,et al.  Geometry of linear ill-posed problems in variable Hilbert scales Inverse Problems 19 789-803 , 2003 .