Ridge regression and asymptotic minimax estimation over spheres of growing dimension

We study asymptotic minimax problems for estimating a $d$-dimensional regression parameter over spheres of growing dimension ($d\to \infty$). Assuming that the data follows a linear model with Gaussian predictors and errors, we show that ridge regression is asymptotically minimax and derive new closed form expressions for its asymptotic risk under squared-error loss. The asymptotic risk of ridge regression is closely related to the Stieltjes transform of the Mar\v{c}enko-Pastur distribution and the spectral distribution of the predictors from the linear model. Adaptive ridge estimators are also proposed (which adapt to the unknown radius of the sphere) and connections with equivariant estimation are highlighted. Our results are mostly relevant for asymptotic settings where the number of observations, $n$, is proportional to the number of predictors, that is, $d/n\to\rho\in(0,\infty)$.

[1]  Émile Borel,et al.  Introduction géométrique à quelques théories physiques , 1915, The Mathematical Gazette.

[2]  Lecons d'Analyse Fonctionelle. , 1925 .

[3]  J. Hadamard,et al.  Leçons D'Analyse Fonctionnelle , 1934, The Mathematical Gazette.

[4]  F. Riesz,et al.  Leçons d,analyse fonctionnelle , 1953 .

[5]  A. J. Stam Some Inequalities Satisfied by the Quantities of Information of Fisher and Shannon , 1959, Inf. Control..

[6]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[7]  L. Brown Admissible Estimators, Recurrent Diffusions, and Insoluble Boundary Value Problems , 1971 .

[8]  A. Baranchik Inadmissibility of Maximum Likelihood Estimators in Some Multiple Regression Problems with Three or More Independent Variables , 1973 .

[9]  Norman R. Draper,et al.  Ridge Regression and James-Stein Estimation: Review and Comments , 1979 .

[10]  James V. Bondar,et al.  Amenability: A survey for statistical applications of hunt-stein and related conditions on groups , 1981 .

[11]  P. Bickel Minimax Estimation of the Mean of a Normal Distribution when the Parameter Space is Restricted , 1981 .

[12]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[13]  D. Freedman,et al.  How Many Variables Should Be Entered in a Regression Equation , 1983 .

[14]  D. Freedman,et al.  A dozen de Finetti-style results in search of a theory , 1987 .

[15]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[16]  Lawrence D. Brown,et al.  Information Inequalities for the Bayes Risk , 1990 .

[17]  L. Brown An Ancillarity Paradox Which Appears in Multiple Linear Regression , 1990 .

[18]  Christian P. Robert,et al.  Modified Bessel functions and their applications in probability and statistics , 1990 .

[19]  L. Brown,et al.  Information Inequality Bounds on the Minimax Risk (with an Application to Nonparametric Regression) , 1991 .

[20]  Z. Bai,et al.  Convergence Rate of Expected Spectral Distributions of Large Random Matrices. Part II. Sample Covariance Matrices , 1993 .

[21]  É. Marchand Estimation of a multivariate mean with constraints on the norm , 1993 .

[22]  I. Johnstone,et al.  Minimax Risk over l p-Balls for l q-error , 1994 .

[23]  B. Levit,et al.  On minimax filtering over ellipsoids , 1995 .

[24]  J. W. Silverstein Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices , 1995 .

[25]  Ram Zamir,et al.  A Proof of the Fisher Information Inequality via a Data Processing Argument , 1998, IEEE Trans. Inf. Theory.

[26]  H. Alzer Inequalities for the gamma function , 1999 .

[27]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[28]  Alexander Goldenshluger,et al.  Adaptive Prediction and Estimation in Linear Regression with Infinitely Many Parameters , 2001 .

[29]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[30]  Jian-Feng Yao,et al.  Convergence Rates of Spectral Distributions of Large Sample Covariance Matrices , 2003, SIAM J. Matrix Anal. Appl..

[31]  A. Tsybakov,et al.  Optimal prediction for linear regression with infinitely many parameters , 2003 .

[32]  Donna L. Mohr,et al.  Multiple Regression , 2002, Encyclopedia of Autism Spectrum Disorders.

[33]  Noureddine El Karoui Spectrum estimation for large dimensional covariance matrices using random matrix theory , 2006, math/0609418.

[34]  M. Nussbaum Minimax Risk, Pinsker Bound for , 2006 .

[35]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[36]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[37]  Zhidong Bai,et al.  CONVERGENCE RATE OF EXPECTED SPECTRAL DISTRIBUTIONS OF LARGE RANDOM MATRICES PART II: SAMPLE COVARIANCE MATRICES , 2008 .

[38]  Hannes Leeb,et al.  Conditional predictive inference post model selection , 2009, 0908.3615.

[39]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[40]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[41]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[42]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[43]  A. Dasgupta False vs. missed discoveries, Gaussian decision theory, and the Donsker-Varadhan principle , 2010 .

[44]  Cun-Hui Zhang,et al.  Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls , 2010, J. Mach. Learn. Res..

[45]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[46]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[47]  Noureddine El Karoui,et al.  Geometric sensitivity of random matrix results: consequences for shrinkage estimators of covariance and related statistical methods , 2011, 1105.1404.

[48]  Lee H. Dicker,et al.  Residual variance and the signal-to-noise ratio in high-dimensional linear models , 2012, 1209.0012.

[49]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[50]  Yin Chen,et al.  Fused sparsity and robust estimation for linear models with unknown variance , 2012, NIPS.

[51]  Pierpaolo Natalini,et al.  On Some Inequalities for the Gamma Function , 2013 .

[52]  Lee H. Dicker,et al.  Optimal equivariant prediction for high-dimensional linear models with arbitrary predictor covariance , 2013 .

[53]  Lee H. Dicker,et al.  Variance estimation in high-dimensional linear models , 2014 .

[54]  D. Donoho,et al.  Minimax risk over / p-balls for / q-error , 2022 .