Optimal Rates for Regularization of Statistical Inverse Learning Problems

We consider a statistical inverse learning (also called inverse regression) problem, where we observe the image of a function f through a linear operator A at i.i.d. random design points $$X_i$$Xi, superposed with an additive noise. The distribution of the design points is unknown and can be very general. We analyze simultaneously the direct (estimation of Af) and the inverse (estimation of f) learning problems. In this general framework, we obtain strong and weak minimax optimal rates of convergence (as the number of observations n grows large) for a large class of spectral regularization methods over regularity classes defined through appropriate source conditions. This improves on or completes previous results obtained in related settings. The optimality of the obtained rates is shown not only in the exponent in n but also in the explicit dependency of the constant factor in the variance of the noise and the radius of the source condition set.

[1]  I. Pinelis,et al.  Remarks on Inequalities for Large Deviation Probabilities , 1986 .

[2]  R. Bhatia,et al.  Frechet derivatives of the power function , 2000 .

[3]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[4]  S. Mendelson,et al.  Regularization in kernel learning , 2010, 1001.2094.

[5]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[6]  E. D. Vito,et al.  DISCRETIZATION ERROR ANALYSIS FOR TIKHONOV REGULARIZATION , 2006 .

[7]  Ronald A. DeVore,et al.  Approximation Methods for Supervised Learning , 2006, Found. Comput. Math..

[8]  C. Marteau,et al.  Minimax fast rates for discriminant analysis with errors in variables , 2012, 1201.3283.

[9]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[10]  F. Girosi,et al.  Regularization Theory and Neural Networks , 1995 .

[11]  N. H. Bingham,et al.  Regular variation in more general settings , 1987 .

[12]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[13]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[14]  R. DeVore,et al.  Mathematical Methods for Supervised Learning , 2005 .

[15]  Lorenzo Rosasco,et al.  Spectral Algorithms for Supervised Learning , 2008, Neural Computation.

[16]  A. Caponnetto Optimal Rates for Regularization Operators in Learning Theory , 2006 .

[17]  Gilles Blanchard,et al.  Convergence rates of Kernel Conjugate Gradient for random design regression , 2016 .

[18]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[19]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[20]  V. Temlyakov Approximation in Learning Theory , 2008 .

[21]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[22]  Nicolai Bissantz,et al.  Convergence Rates of General Regularization Methods for Statistical Inverse Problems and Applications , 2007, SIAM J. Numer. Anal..

[23]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[24]  P. Halmos,et al.  Bounded integral operators on L²spaces , 1978 .

[25]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[26]  Y. Yao,et al.  Cross-validation based adaptation for regularization operators in learning theory , 2010 .

[27]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[28]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[29]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[30]  L. Hörmander The analysis of linear partial differential operators , 1990 .

[31]  Sébastien Loustau Inverse statistical learning , 2013 .

[32]  S. Smale,et al.  Shannon sampling II: Connections to learning theory , 2005 .

[33]  Alexandre Tsybakov Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii , 2007 .

[34]  F. O’Sullivan Convergence characteristics of methods of regularization estimators for nonlinear operator equations , 1990 .

[35]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[36]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[37]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[38]  P. Mathé,et al.  Geometry of linear ill-posed problems in variable Hilbert scales Inverse Problems 19 789-803 , 2003 .

[39]  Cheng Wang,et al.  Optimal learning rates for least squares regularized regression with unbounded sampling , 2011, J. Complex..