Penalised robust estimators for sparse and high-dimensional linear models

We introduce a new class of robust M -estimators for performing simultaneous parameter estimation and variable selection in high-dimensional regression models. We first explain the motivations for the key ingredient of our procedures which are inspired by regularization methods used in wavelet thresholding in noisy signal processing. The derived penalized estimation procedures are shown to enjoy theoretically the oracle property both in the classical finite dimensional case as well as the high-dimensional case when the number of variables p is not fixed but can grow with the sample size n , and to achieve optimal asymptotic rates of convergence. A fast accelerated proximal gradient algorithm, of coordinate descent type, is proposed and implemented for computing the estimates and appears to be surprisingly efficient in solving the corresponding regularization problems including the case for ultra high-dimensional data where $$p \gg n$$ p ≫ n . Finally, a very extensive simulation study and some real data analysis, compare several recent existing M-estimation procedures with the ones proposed in the paper, and demonstrate their utility and their advantages.

[1]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[2]  A. Antoniadis Comments on: ℓ1-penalization for mixture regression models , 2010 .

[3]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[4]  Ezequiel Smucler,et al.  Robust elastic net estimators for variable selection and identification of proteomic biomarkers , 2019 .

[5]  Irène Gannaz,et al.  Robust estimation and wavelet thresholding in partially linear models , 2007, Stat. Comput..

[6]  Olcay Arslan,et al.  Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression , 2012, Comput. Stat. Data Anal..

[7]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[8]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[9]  E. Bullmore,et al.  Penalized partially linear models using sparse representations with an application to fMRI time series , 2005, IEEE Transactions on Signal Processing.

[10]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[11]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[14]  Christophe Croux,et al.  Sparse least trimmed squares regression for analyzing high-dimensional large data sets , 2013, 1304.4773.

[15]  Anestis Antoniadis,et al.  Wavelet methods in statistics: Some recent developments and their applications , 2007, 0712.0283.

[16]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[17]  V. Yohai,et al.  Asymptotic behavior of general M-estimates for regression and scale with random carriers , 1981 .

[18]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[19]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[20]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[21]  Florentina Bunea,et al.  Consistent selection via the Lasso for high dimensional approximating regression models , 2008, 0805.3224.

[22]  Irène Gijbels,et al.  Robust nonnegative garrote variable selection in linear regression , 2015, Comput. Stat. Data Anal..

[23]  Zhaoran Wang,et al.  OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS. , 2013, Annals of statistics.

[24]  Laurent Zwald,et al.  Robust regression through the Huber’s criterion and adaptive lasso penalty , 2011 .

[25]  Jafar A. Khan,et al.  Robust Linear Model Selection Based on Least Angle Regression , 2007 .

[26]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[27]  Ricardo A. Maronna,et al.  Robust Ridge Regression for High-Dimensional Data , 2011, Technometrics.

[28]  Victor J. Yohai,et al.  Robust and sparse estimators for linear regression models , 2015, Comput. Stat. Data Anal..

[29]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[30]  Jianqing Fan,et al.  Penalized composite quasi‐likelihood for ultrahigh dimensional variable selection , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[31]  Xiao-Wen Chang,et al.  Wavelet estimation of partially linear models , 2004, Comput. Stat. Data Anal..

[32]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[33]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[34]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[35]  Po-Ling Loh,et al.  Statistical consistency and asymptotic normality for high-dimensional robust M-estimators , 2015, ArXiv.

[36]  K. Janssens,et al.  Composition of 15-17th century archaeological glass vessels excavated in Antwerp, Belgium , 1998 .

[37]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[38]  Otis W. Gilley,et al.  Using the Spatial Configuration of the Data to Improve Estimation , 1997 .

[39]  I. Gijbels,et al.  Penalized likelihood regression for generalized linear models with non-quadratic penalties , 2011 .

[40]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[41]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[42]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[43]  I. Gijbels,et al.  Consistency and robustness properties of the S-nonnegative garrote estimator , 2017 .

[44]  Jianqing Fan,et al.  Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions , 2017, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[45]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[46]  Elvezio Ronchetti,et al.  Robust and consistent variable selection for generalized linear and additive models , 2014 .

[47]  Lie Wang The L1L1 penalized LAD estimator for high dimensional linear regression , 2013, J. Multivar. Anal..

[48]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[49]  W. Rey Introduction to Robust and Quasi-Robust Statistical Methods , 1983 .

[50]  Ziqi Chen,et al.  New Robust Variable Selection Methods for Linear Regression Models , 2014 .

[51]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[52]  Jakob Raymaekers,et al.  Discussion of “The power of monitoring: how to make the most of a contaminated multivariate sample” , 2018, Stat. Methods Appl..

[53]  Anthony C. Atkinson,et al.  The power of monitoring: how to make the most of a contaminated multivariate sample , 2018, Stat. Methods Appl..

[54]  Paul Rodríguez,et al.  A two-term penalty function for inverse problems with sparsity constrains , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[55]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[56]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[57]  J. Dennis,et al.  Techniques for nonlinear least squares and robust regression , 1978 .

[58]  Cun-Hui Zhang,et al.  Comments on: ℓ1-penalization for mixture regression models , 2010 .

[59]  Yichao Wu,et al.  FULLY EFFICIENT ROBUST ESTIMATION, OUTLIER DETECTION AND VARIABLE SELECTION VIA PENALIZED REGRESSION , 2018 .

[60]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[61]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[62]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[63]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.