Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results

We study the behavior of high-dimensional robust regression estimators in the asymptotic regime where $p/n$ tends to a finite non-zero limit. More specifically, we study ridge-regularized estimators, i.e $\widehat{\beta}=\text{argmin}_{\beta \in \mathbb{R}^p} \frac{1}{n}\sum_{i=1}^n \rho(\varepsilon_i-X_i' \beta)+\frac{\tau}{2}\lVert\beta\rVert^2$. In a recently published paper, we had developed with collaborators probabilistic heuristics to understand the asymptotic behavior of $\widehat{\beta}$. We give here a rigorous proof, properly justifying all the arguments we had given in that paper. Our proof is based on the probabilistic heuristics we had developed, and hence ideas from random matrix theory, measure concentration and convex analysis. While most the work is done for $\tau>0$, we show that under some extra assumptions on $\rho$, it is possible to recover the case $\tau=0$ as a limiting case. We require that the $X_i$'s be i.i.d with independent entries, but our proof handles the case where these entries are not Gaussian. A 2-week old paper of Donoho and Montanari [arXiv:1310.7320] studied a similar problem by a different method and with a different point of view. At this point, their interesting approach requires Gaussianity of the design matrix.

[1]  I. Ibragimov,et al.  On the Composition of Unimodal Distributions , 1956 .

[2]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[3]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[4]  J. Cooper TOTAL POSITIVITY, VOL. I , 1970 .

[5]  P. J. Huber The 1972 Wald Lecture Robust Statistics: A Review , 1972 .

[6]  A. Baranchik Inadmissibility of Maximum Likelihood Estimators in Some Multiple Regression Problems with Three or More Independent Variables , 1973 .

[7]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[8]  A. Prékopa On logarithmic concave measures and functions , 1973 .

[9]  K. Wachter The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements , 1978 .

[10]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[11]  S. Portnoy Asymptotic behavior of M-estimators of p regression parameters when p , 1985 .

[12]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[13]  J. W. Silverstein The Smallest Eigenvalue of a Large Dimensional Wishart Matrix , 1985 .

[14]  Stephen Portnoy,et al.  A central limit theorem applicable to robust regression estimators , 1987 .

[15]  E. Mammen Asymptotics with increasing dimension for robust regression with applications to the bootstrap , 1989 .

[16]  R. Durrett Probability: Theory and Examples , 1993 .

[17]  J. W. Silverstein Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices , 1995 .

[18]  Z. Bai METHODOLOGIES IN SPECTRAL ANALYSIS OF LARGE DIMENSIONAL RANDOM MATRICES , A REVIEW , 1999 .

[19]  M. Shcherbina,et al.  Rigorous Solution of the Gardner Problem , 2001, math-ph/0112003.

[20]  M. Talagrand,et al.  Spin Glasses: A Challenge for Mathematicians , 2003 .

[21]  Donna L. Mohr,et al.  Multiple Regression , 2002, Encyclopedia of Autism Spectrum Disorders.

[22]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[23]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[24]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[25]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[26]  Noureddine El Karoui,et al.  Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond , 2009, 0912.1950.

[27]  Convex Optimization in Signal Processing and Communications , 2010 .

[28]  Sundeep Rangan,et al.  Generalized approximate message passing for estimation with random linear mixing , 2010, 2011 IEEE International Symposium on Information Theory Proceedings.

[29]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[30]  Andrea Montanari,et al.  The LASSO Risk for Gaussian Matrices , 2010, IEEE Transactions on Information Theory.

[31]  P. Bickel,et al.  Optimal M-estimation in high-dimensional regression , 2013, Proceedings of the National Academy of Sciences.

[32]  P. Bickel,et al.  On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.

[33]  Andrea Montanari,et al.  High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.