Rank-one Convexification for Sparse Regression

Sparse regression models are increasingly prevalent due to their ease of interpretability and superior out-of-sample performance. However, the exact model of sparse regression with an $\ell_0$ constraint restricting the support of the estimators is a challenging (\NP-hard) non-convex optimization problem. In this paper, we derive new strong convex relaxations for sparse regression. These relaxations are based on the ideal (convex-hull) formulations for rank-one quadratic terms with indicator variables. The new relaxations can be formulated as semidefinite optimization problems in an extended space and are stronger and more general than the state-of-the-art formulations, including the perspective reformulation and formulations with the reverse Huber penalty and the minimax concave penalty functions. Furthermore, the proposed rank-one strengthening can be interpreted as a \textit{non-separable, non-convex, unbiased} sparsity-inducing regularizer, which dynamically adjusts its penalty according to the shape of the error function without inducing bias for the sparse solutions. In our computational experiments with benchmark datasets, the proposed conic formulations are solved within seconds and result in near-optimal solutions (with 0.4\% optimality gap) for non-convex $\ell_0$-problems. Moreover, the resulting estimators also outperform alternative convex approaches from a statistical perspective, achieving high prediction accuracy and good interpretability.

[1]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[2]  Robert E. Bixby,et al.  A Brief History of Linear and Mixed-Integer Programming Computation , 2012 .

[3]  Dimitris Bertsimas,et al.  OR Forum - An Algorithmic Approach to Linear Regression , 2016, Oper. Res..

[4]  ANDRÉS GÓMEZ,et al.  Strong formulations for conic quadratic optimization with indicator variables , 2020, Math. Program..

[5]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[6]  Jeff T. Linderoth,et al.  Regularization vs. Relaxation: A conic optimization perspective of statistical variable selection , 2015, ArXiv.

[7]  Nikolaos V. Sahinidis,et al.  The ALAMO approach to machine learning , 2017, Comput. Chem. Eng..

[8]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[9]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[10]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[11]  P. Radchenko,et al.  Subset Selection with Shrinkage: Sparse Linear Modeling When the SNR Is Low , 2017, Oper. Res..

[12]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[13]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  Hussein Hazimeh,et al.  Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms , 2018, Oper. Res..

[16]  James G. Scott,et al.  The DFS Fused Lasso: Linear-Time Denoising over General Graphs , 2016, J. Mach. Learn. Res..

[17]  Martin J. Wainwright,et al.  Sparse learning via Boolean relaxations , 2015, Mathematical Programming.

[18]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[19]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[20]  Jeff T. Linderoth,et al.  Quadratic cone cutting surfaces for quadratic programs with on-off constraints , 2017, Discret. Optim..

[21]  Claudio Gentile,et al.  Decompositions of Semidefinite Matrices and the Perspective Reformulation of Nonseparable Quadratic Programs , 2020, Math. Oper. Res..

[22]  R. Tibshirani,et al.  Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso , 2017, 1707.08692.

[23]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[24]  Alper Atamtürk,et al.  Sparse and Smooth Signal Estimation: Convexification of L0 Formulations , 2018, J. Mach. Learn. Res..

[25]  H. Akaike A new look at the statistical model identification , 1974 .

[26]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[27]  David C. Miller,et al.  Learning surrogate models for simulation‐based optimization , 2014 .

[28]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[29]  Oleg A. Prokopyev,et al.  A Mixed-Integer Fractional Optimization Approach to Best Subset Selection , 2021, INFORMS J. Comput..

[30]  Martin J. Wainwright,et al.  A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees , 2016, J. Mach. Learn. Res..

[31]  A. Atkinson Subset Selection in Regression , 1992 .

[32]  HuangJian,et al.  A constructive approach to L0 penalized regression , 2018 .

[33]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[34]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[35]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[36]  Alper Atamtürk,et al.  Strong formulations for quadratic optimization with M-matrices and indicator variables , 2018, Math. Program..

[37]  Santanu S. Dey,et al.  Matrix minor reformulation and SOCP-based spatial branch-and-cut method for the AC optimal power flow problem , 2017, Math. Program. Comput..

[38]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[39]  Oktay Günlük,et al.  Perspective reformulations of mixed integer nonlinear programs with indicator variables , 2010, Math. Program..

[40]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[41]  Alper Atamtürk,et al.  $$\mathbf {2\times 2}$$ 2 × 2 -Convexifications for convex quadratic optimization with indicator variables , 2023, Mathematical Programming.

[42]  Yaacov Ritov,et al.  Identifying a Minimal Class of Models for High-dimensional Data , 2015, J. Mach. Learn. Res..

[43]  Xiaodong Lin,et al.  Alternating linearization for structured regularization problems , 2011, J. Mach. Learn. Res..

[44]  Sinan Gürel,et al.  A strong conic quadratic reformulation for machine-job assignment with controllable processing times , 2009, Oper. Res. Lett..

[45]  Alper Atamtürk,et al.  Cuts for Conic Mixed-Integer Programming , 2007, IPCO.

[46]  Santanu S. Dey,et al.  Strong SOCP Relaxations for the Optimal Power Flow Problem , 2015, Oper. Res..

[47]  Rahul Mazumder,et al.  Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization , 2020, ArXiv.

[48]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[49]  S. Geer,et al.  The Smooth-Lasso and other ℓ1+ℓ2-penalized methods , 2011 .

[50]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[51]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[52]  Ryuhei Miyashiro,et al.  Mixed integer second-order cone programming formulations for variable selection in linear regression , 2015, Eur. J. Oper. Res..

[53]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[54]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[55]  Claudio Gentile,et al.  Perspective cuts for a class of convex 0–1 mixed integer programs , 2006, Math. Program..