Robust linear regression for high‐dimensional data: An overview

Digitization as the process of converting information into numbers leads to bigger and more complex data sets, bigger also with respect to the number of measured variables. This makes it harder or impossible for the practitioner to identify outliers or observations that are inconsistent with an underlying model. Classical least‐squares based procedures can be affected by those outliers. In the regression context, this means that the parameter estimates are biased, with consequences on the validity of the statistical inference, on regression diagnostics, and on the prediction accuracy. Robust regression methods aim at assigning appropriate weights to observations that deviate from the model. While robust regression techniques are widely known in the low‐dimensional case, researchers and practitioners might still not be very familiar with developments in this direction for high‐dimensional data. Recently, different strategies have been proposed for robust regression in the high‐dimensional case, typically based on dimension reduction, on shrinkage, including sparsity, and on combinations of such techniques. A very recent concept is downweighting single cells of the data matrix rather than complete observations, with the goal to make better use of the model‐consistent information, and thus to achieve higher efficiency of the parameter estimates.

[1]  Christophe Croux,et al.  Sparse regression for large data sets with outliers , 2021, Eur. J. Oper. Res..

[2]  Stefan Van Aelst,et al.  Sparse Principal Component Analysis Based on Least Trimmed Squares , 2020, Technometrics.

[3]  Peter Filzmoser,et al.  Robust Multivariate Methods in Chemometrics , 2020, Comprehensive Chemometrics.

[4]  P. Filzmoser,et al.  Cellwise robust M regression , 2019, Comput. Stat. Data Anal..

[5]  Ezequiel Smucler,et al.  Robust elastic net estimators for variable selection and identification of proteomic biomarkers , 2019 .

[6]  Chunxia Zhang,et al.  Robust sparse regression by modeling noise as a mixture of gaussians , 2019, Journal of Applied Statistics.

[7]  Stefan Van Aelst,et al.  Robust variable screening for regression using factor profiling , 2017, Stat. Anal. Data Min..

[8]  Michael Muma,et al.  Robust Statistics for Signal Processing , 2018 .

[9]  Peter Filzmoser,et al.  A robust Liu regression estimator , 2018, Commun. Stat. Simul. Comput..

[10]  Le Chang,et al.  Robust Lasso Regression Using Tukey's Biweight Criterion , 2018, Technometrics.

[11]  Peter Rousseeuw,et al.  Detecting Deviating Data Cells , 2016, Technometrics.

[12]  Yanxin Wang,et al.  Variable selection and parameter estimation via WLAD–SCAD with a diverging number of parameters , 2017 .

[13]  Peter Filzmoser,et al.  Robust and sparse estimation methods for high-dimensional linear and logistic regression , 2017, 1703.04951.

[14]  C. Agostinelli,et al.  Robust iteratively reweighted SIMPLS , 2017 .

[15]  Victor J. Yohai,et al.  Robust and sparse estimators for linear regression models , 2015, Comput. Stat. Data Anal..

[16]  Mia Hubert,et al.  Sparse PCA for High-Dimensional Data With Outliers , 2016, Technometrics.

[17]  Peter Filzmoser,et al.  Sparse and robust PLS for binary classification , 2016 .

[18]  Hongyang Zhang,et al.  Robust regression estimation and inference in the presence of cellwise and casewise contamination , 2015, Comput. Stat. Data Anal..

[19]  Christophe Croux,et al.  The shooting S-estimator for robust regression , 2015, Comput. Stat..

[20]  Visa Koivunen,et al.  New robust LASSO method based on ranks , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[21]  P. Filzmoser,et al.  Sparse Partial Robust M Regression , 2015 .

[22]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[23]  M. Sillanpää,et al.  Robust Variable Selection and Coefficient Estimation in Multivariate Multiple Regression Using LAD-Lasso , 2015 .

[24]  Christophe Croux,et al.  The influence function of penalized regression estimators , 2015, 1501.01208.

[25]  Fang Han,et al.  Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model , 2013, NIPS.

[26]  Ji Fu,et al.  A New Class of Biased Estimate , 2013 .

[27]  Peter Filzmoser,et al.  Robust Sparse Principal Component Analysis , 2013, Technometrics.

[28]  Maria-Pia Victoria-Feser,et al.  Robust VIF regression with application to variable selection in large data sets , 2013, 1304.5349.

[29]  Christophe Croux,et al.  Sparse least trimmed squares regression for analyzing high-dimensional large data sets , 2013, 1304.4773.

[30]  Olcay Arslan,et al.  Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression , 2012, Comput. Stat. Data Anal..

[31]  Youngjo Lee,et al.  Sparse partial least-squares regression and its applications to high-throughput data analysis , 2011 .

[32]  A. Basu,et al.  Statistical Inference: The Minimum Distance Approach , 2011 .

[33]  Ricardo A. Maronna,et al.  Robust Ridge Regression for High-Dimensional Data , 2011, Technometrics.

[34]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[35]  P. Filzmoser,et al.  Repeated double cross validation , 2009 .

[36]  Stefan Van Aelst,et al.  Propagation of outliers in multivariate data , 2009, 0903.0447.

[37]  Jafar A. Khan,et al.  Robust Linear Model Selection Based on Least Angle Regression , 2007 .

[38]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[39]  P. Filzmoser,et al.  Algorithms for Projection-Pursuit Robust Principal Component Analysis , 2007 .

[40]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[41]  Peter Filzmoser,et al.  Partial robust M-regression , 2005 .

[42]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[43]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[44]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[45]  PETER J. ROUSSEEUW,et al.  Computing LTS Regression for Large Data Sets , 2005, Data Mining and Knowledge Discovery.

[46]  R. Tibshirani,et al.  REJOINDER TO "LEAST ANGLE REGRESSION" BY EFRON ET AL. , 2004, math/0406474.

[47]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[48]  M. Hubert,et al.  A robust PCR method for high‐dimensional regressors , 2003 .

[49]  Ruben H. Zamar,et al.  Robust Estimates of Location and Dispersion for High-Dimensional Datasets , 2002, Technometrics.

[50]  David W. Scott,et al.  Parametric Statistical Modeling by Minimum Integrated Square Error , 2001, Technometrics.

[51]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[52]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[53]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[54]  Liu Kejian,et al.  A new class of blased estimate in linear regression , 1993 .

[55]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[56]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[57]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[58]  P. Rousseeuw Least Median of Squares Regression , 1984 .