Random Design Analysis of Ridge Regression

This work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the “out-of-sample” prediction error, as opposed to the “in-sample” (fixed design) error. The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors, neither of which effects are present in the fixed design setting. The proofs of the main results are based on a simple decomposition lemma combined with concentration inequalities for random vectors and matrices.

[1]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[2]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[3]  M. Nussbaum Minimax Risk, Pinsker Bound for , 2006 .

[4]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[5]  J. Tropp FREEDMAN'S INEQUALITY FOR MATRIX MARTINGALES , 2011, 1101.3039.

[6]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[7]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[8]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[9]  Jean-Yves Audibert,et al.  Robust linear regression through PAC-Bayesian truncation , 2010 .

[10]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[11]  M. Rudelson,et al.  Smallest singular value of random matrices and geometry of random polytopes , 2005 .

[12]  Jean-Yves Audibert,et al.  Linear regression through PAC-Bayesian truncation , 2010, 1010.0072.

[13]  Petros Drineas,et al.  Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving , 2010, ArXiv.

[14]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[15]  Daniel J. Hsu,et al.  Tail inequalities for sums of random matrices that depend on the intrinsic dimension , 2012 .

[16]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[17]  Sham M. Kakade,et al.  Dimension-free tail inequalities for sums of random matrices , 2011, ArXiv.

[18]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[19]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[20]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[21]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[22]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[23]  Jean-Yves Audibert,et al.  Robust linear least squares regression , 2010, 1010.0074.

[24]  V. Rokhlin,et al.  A fast randomized algorithm for overdetermined linear least-squares regression , 2008, Proceedings of the National Academy of Sciences.

[25]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[26]  Tong Zhang,et al.  Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[27]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[28]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.