Robust subset selection

The best subset selection (or "best subsets") estimator is a classic tool for sparse regression, and developments in mathematical optimization over the past decade have made it more computationally tractable than ever. Notwithstanding its desirable statistical properties, the best subsets estimator is susceptible to outliers and can break down in the presence of a single contaminated data point. To address this issue, we propose a robust adaption of best subsets that is highly resistant to contamination in both the response and the predictors. Our estimator generalizes the notion of subset selection to both predictors and observations, thereby achieving robustness in addition to sparsity. This procedure, which we call "robust subset selection" (or "robust subsets"), is defined by a combinatorial optimization problem for which we apply modern discrete optimization methods. We formally establish the robustness of our estimator in terms of the finite-sample breakdown point of its objective value. In support of this result, we report experiments on both synthetic and real data that demonstrate the superiority of robust subsets over best subsets in the presence of contamination. Importantly, robust subsets fares competitively across several metrics compared with popular robust adaptions of the Lasso.

[1]  Marc Hofmann,et al.  An Exact Least Trimmed Squares Algorithm for a Range of Coverage Values , 2010 .

[2]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.

[3]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[4]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[5]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[6]  Dimitris Bertsimas,et al.  OR Forum - An Algorithmic Approach to Linear Regression , 2016, Oper. Res..

[7]  Kota KUDO,et al.  Stochastic Discrete First-Order Algorithm for Feature Subset Selection , 2020, IEICE Trans. Inf. Syst..

[8]  R. Tibshirani,et al.  REJOINDER TO "LEAST ANGLE REGRESSION" BY EFRON ET AL. , 2004, math/0406474.

[9]  Trevor Hastie,et al.  Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons , 2020 .

[10]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[11]  Liu Liu,et al.  High Dimensional Robust Sparse Regression , 2018, AISTATS.

[12]  Trac D. Tran,et al.  Robust Lasso With Missing and Grossly Corrupted Observations , 2011, IEEE Transactions on Information Theory.

[13]  Hussein Hazimeh,et al.  Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms , 2018, Oper. Res..

[14]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[15]  N. Meinshausen,et al.  Minimum Distance Lasso for robust high-dimensional regression , 2016 .

[16]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[17]  Umberto Amato,et al.  Penalised robust estimators for sparse and high-dimensional linear models , 2020, Statistical Methods & Applications.

[18]  M. Kendall,et al.  The discarding of variables in multivariate analysis. , 1967, Biometrika.

[19]  Giovanni Felici,et al.  Simultaneous feature selection and outlier detection with optimality guarantees , 2020, Biometrics.

[20]  Amir Beck,et al.  On the Convergence of Alternating Minimization for Convex Programming with Applications to Iteratively Reweighted Least Squares and Decomposition Schemes , 2015, SIAM J. Optim..

[21]  Pascal Lemberge,et al.  Quantitative analysis of 16–17th century archaeological glass vessels using PLS regression of EPXMA and µ‐XRF data , 2000 .

[22]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[23]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[24]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  Giovanni Felici,et al.  MIP-BOOST: Efficient and Effective L 0 Feature Selection for Linear Regression , 2018, J. Comput. Graph. Stat..

[27]  Dennis Kreber A mixed-integer optimization approach to an exhaustive cross-validated model selection for regression , 2019 .

[28]  George L. Nemhauser,et al.  Handbooks in operations research and management science , 1989 .

[29]  Erricos John Kontoghiorghes,et al.  A branch and bound algorithm for computing the best subset regression models , 2002 .

[30]  Rahul Mazumder,et al.  The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization , 2015, IEEE Transactions on Information Theory.

[31]  Heping Zhang,et al.  Robust Variable Selection With Exponential Squared Loss , 2013, Journal of the American Statistical Association.

[32]  Laks V. S. Lakshmanan,et al.  Split Regularized Regression , 2017, Technometrics.

[33]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[34]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[35]  Christophe Croux,et al.  Sparse least trimmed squares regression for analyzing high-dimensional large data sets , 2013, 1304.4773.

[36]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[37]  D. Bertsimas,et al.  Least quantile regression via modern optimization , 2013, 1310.8625.

[38]  P. Radchenko,et al.  Subset Selection with Shrinkage: Sparse Linear Modeling When the SNR Is Low , 2017, Oper. Res..

[39]  Pradeep Ravikumar,et al.  Adaptive Hard Thresholding for Near-optimal Consistent Robust Regression , 2019, COLT.

[40]  Ezequiel Smucler,et al.  Robust elastic net estimators for variable selection and identification of proteomic biomarkers , 2019 .

[41]  R. R. Hocking,et al.  Selection of the Best Subset in Regression Analysis , 1967 .

[42]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[43]  K. Janssens,et al.  Composition of 15-17th century archaeological glass vessels excavated in Antwerp, Belgium , 1998 .

[44]  Roy E. Welsch,et al.  Robust variable selection using least angle regression and elemental set sampling , 2007, Comput. Stat. Data Anal..

[45]  Ricardo A. Maronna,et al.  Robust Ridge Regression for High-Dimensional Data , 2011, Technometrics.

[46]  PETER J. ROUSSEEUW,et al.  Computing LTS Regression for Large Data Sets , 2005, Data Mining and Knowledge Discovery.

[47]  Le Chang,et al.  Robust Lasso Regression Using Tukey's Biweight Criterion , 2018, Technometrics.

[48]  Roy E. Welsch,et al.  A diagnostic method for simultaneous feature selection and outlier identification in linear regression , 2010, Comput. Stat. Data Anal..

[49]  Laurent Zwald,et al.  Robust regression through the Huber’s criterion and adaptive lasso penalty , 2011 .

[50]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[51]  YuBin,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2011 .

[52]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[53]  Eunho Yang,et al.  High-Dimensional Trimmed Estimators: A General Framework for Robust Structured Estimation , 2016, 1605.08299.

[54]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[55]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[56]  Jian Huang,et al.  Semismooth Newton Coordinate Descent Algorithm for Elastic-Net Penalized Huber Loss Regression and Quantile Regression , 2015, 1509.02957.

[57]  Leonidas S. Pitsoulis,et al.  Quadratic mixed integer programming and support vectors for deleting outliers in robust regression , 2009, Ann. Oper. Res..

[58]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[59]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[60]  M. J. Garside,et al.  The Best Sub‐Set in Multiple Regression Analysis , 1965 .

[61]  Wei Pan,et al.  On constrained and regularized high-dimensional regression , 2013, Annals of the Institute of Statistical Mathematics.

[62]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[63]  Ryuhei Miyashiro,et al.  Best subset selection via cross-validation criterion , 2020 .

[64]  Jafar A. Khan,et al.  Robust Linear Model Selection Based on Least Angle Regression , 2007 .

[65]  Prateek Jain,et al.  Robust Regression via Hard Thresholding , 2015, NIPS.

[66]  Victor J. Yohai,et al.  Robust and sparse estimators for linear regression models , 2015, Comput. Stat. Data Anal..

[67]  Peter Filzmoser,et al.  Partial robust M-regression , 2005 .

[68]  Marc Hofmann,et al.  Efficient algorithms for computing the best subset regression models for large-scale problems , 2007, Comput. Stat. Data Anal..

[69]  Dimitris Bertsimas,et al.  Sparse Regression: Scalable Algorithms and Empirical Performance , 2019, Statistical Science.