Robust Estimation of High-Dimensional Mean Regression

Data subject to heavy-tailed errors are commonly encountered in various scientific fields, especially in the modern era with explosion of massive data. To address this problem, procedures based on quantile regression and Least Absolute Deviation (LAD) regression have been devel- oped in recent years. These methods essentially estimate the conditional median (or quantile) function. They can be very different from the conditional mean functions when distributions are asymmetric and heteroscedastic. How can we efficiently estimate the mean regression functions in ultra-high dimensional setting with existence of only the second moment? To solve this problem, we propose a penalized Huber loss with diverging parameter to reduce biases created by the traditional Huber loss. Such a penalized robust approximate quadratic (RA-quadratic) loss will be called RA-Lasso. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, our results reveal that the RA-lasso estimator produces a consistent estimator at the same rate as the optimal rate under the light-tail situation. We further study the computational convergence of RA-Lasso and show that the composite gradient descent algorithm indeed produces a solution that admits the same optimal rate after sufficient iterations. As a byproduct, we also establish the concentration inequality for estimat- ing population mean when there exists only the second moment. We compare RA-Lasso with other regularized robust estimators based on quantile regression and LAD regression. Extensive simulation studies demonstrate the satisfactory finite-sample performance of RA-Lasso.

[1]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[2]  K. Alexander,et al.  Rates of growth and sample moduli for weighted empirical processes indexed by sets , 1987 .

[3]  Omar Rivasplata,et al.  Subgaussian random variables : An expository note , 2012 .

[4]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[5]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[6]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[7]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[8]  Ji Zhu,et al.  L1-Norm Quantile Regression , 2008 .

[9]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[10]  Yufeng Liu,et al.  VARIABLE SELECTION IN QUANTILE REGRESSION , 2009 .

[11]  Jianqing Fan,et al.  Large covariance estimation by thresholding principal orthogonal complements , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[12]  B. Efron Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates , 2010, Journal of the American Statistical Association.

[13]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[14]  Chiang-Ching Huang,et al.  Activated TLR Signaling in Atherosclerosis among Women with Lower Framingham Risk Score: The Multi-Ethnic Study of Atherosclerosis , 2011, PloS one.

[15]  Lie Wang The L1L1 penalized LAD estimator for high dimensional linear regression , 2013, J. Multivar. Anal..

[16]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[17]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[18]  Martin J. Wainwright,et al.  FASt global convergence of gradient methods for solving regularized M-estimation , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[19]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[20]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[21]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[22]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[23]  H. Zou,et al.  Composite quantile regression and the oracle Model Selection Theory , 2008, 0806.2905.

[24]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  Jianqing Fan,et al.  ADAPTIVE ROBUST VARIABLE SELECTION. , 2012, Annals of statistics.

[27]  Martin J. Wainwright,et al.  Fast global convergence of gradient methods for high-dimensional statistical recovery , 2011, ArXiv.

[28]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[29]  Jianqing Fan,et al.  Journal of the American Statistical Association Estimating False Discovery Proportion under Arbitrary Covariance Dependence Estimating False Discovery Proportion under Arbitrary Covariance Dependence , 2022 .

[30]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[31]  S. Geer Empirical Processes in M-Estimation , 2000 .

[32]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .