Robust Signed-Rank Variable Selection in Linear Regression

The growing need for dealing with big data has made it necessary to find computationally efficient methods for identifying important factors to be considered in statistical modeling. In the linear model, the Lasso is an effective way of selecting variables using penalized regression. It has spawned substantial research in the area of variable selection for models that depend on a linear combination of predictors. However, work addressing the lack of optimality of variable selection when the model errors are not Gaussian and/or when the data contain gross outliers is scarce. We propose the weighted signed-rank Lasso as a robust and efficient alternative to least absolute deviations and least squares Lasso. The approach is appealing for use with big data since one can use data augmentation to perform the estimation as a single weighted L1 optimization problem. Selection and estimation consistency are theoretically established and evaluated via simulation studies. The results confirm the optimality of the rank-based approach for data with heavy-tailed and contaminated errors or data containing high-leverage points.

[1]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[2]  A. Abebe,et al.  Bounded influence nonlinear signed‐rank regression , 2012 .

[3]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[4]  Olcay Arslan,et al.  Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression , 2012, Comput. Stat. Data Anal..

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  O. Hössjer Rank-Based Estimates in the Linear Model with High Breakdown Point , 1994 .

[7]  Chenlei Leng,et al.  Rank-based variable selection with censored data , 2010, Stat. Comput..

[8]  Brent A. Johnson,et al.  Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models , 2008, Journal of the American Statistical Association.

[9]  Changbao Wu,et al.  Asymptotic Theory of Nonlinear Least Squares Estimation , 1981 .

[10]  Runze Li,et al.  Weighted Wilcoxon‐Type Smoothly Clipped Absolute Deviation Method , 2009, Biometrics.

[11]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[12]  Chenlei Leng,et al.  VARIABLE SELECTION AND COEFFICIENT ESTIMATION VIA REGULARIZED RANK REGRESSION , 2010 .

[13]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[14]  Limin Peng,et al.  Rank-based variable selection , 2008 .

[15]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[16]  Brent A. Johnson,et al.  Rank-based estimation in the {ell}1-regularized partly linear model for censored outcomes with application to integrated analyses of clinical predictors and gene expression data. , 2009, Biostatistics.

[17]  T. Hettmansperger,et al.  Robust Nonparametric Statistical Methods , 1998 .

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  J. McKean,et al.  On the Consistency of a Class of Nonlinear Regression Estimators , 2012 .