The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization

We propose a novel high-dimensional linear regression estimator: the <italic>Discrete Dantzig Selector</italic>, which minimizes the number of nonzero regression coefficients subject to a budget on the maximal absolute correlation between the features and residuals. Motivated by the significant advances in integer optimization over the past 10–15 years, we present a mixed integer linear optimization (<monospace>MILO</monospace>) approach to obtain <italic>certifiably optimal</italic> global solutions to this nonconvex optimization problem. The current state of algorithmics in integer optimization makes our proposal substantially more computationally attractive than the least squares subset selection framework based on integer <italic>quadratic</italic> optimization, recently proposed by Bertsimas <italic>et al.</italic> and the continuous nonconvex quadratic optimization framework of Liu <italic>et al.</italic>. We propose new discrete first-order methods, which when paired with the state-of-the-art <monospace>MILO</monospace> solvers, lead to good solutions for the <italic>Discrete Dantzig Selector</italic> problem for a given computational budget. We illustrate that our integrated approach provides globally optimal solutions in significantly shorter computation times, when compared to off-the-shelf <monospace>MILO</monospace> solvers. We demonstrate both theoretically and empirically that in a wide range of regimes the statistical properties of the <italic>Discrete Dantzig Selector</italic> are superior to those of popular <inline-formula> <tex-math notation="LaTeX">$\ell _{1}$ </tex-math></inline-formula>-based approaches. We illustrate that our approach can handle problem instances with <inline-formula> <tex-math notation="LaTeX">$p =10,\!000$ </tex-math></inline-formula> features with certifiable optimality making it a highly scalable combinatorial variable selection approach in sparse linear modeling.

[1]  Franco Giannessi,et al.  Nonconvex Quadratic Programs, Linear Complementarity Problems, and Integer Linear Programs , 1973, Optimization Techniques.

[2]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[3]  H. P. Williams,et al.  Model Building in Mathematical Programming , 1979 .

[4]  Richard C. Larson,et al.  Model Building in Mathematical Programming , 1979 .

[5]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[8]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[9]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[10]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[11]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[12]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[13]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14]  Dimitris Bertsimas,et al.  Optimization over integers , 2005 .

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[17]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[18]  Paul Tseng,et al.  Exact Regularization of Convex Programs , 2007, SIAM J. Optim..

[19]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[20]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[21]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[22]  Gareth M. James,et al.  A generalized Dantzig selector with shrinkage tuning , 2009 .

[23]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[24]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[25]  Gareth M. James,et al.  DASSO: connections between the Dantzig selector and lasso , 2009 .

[26]  P. L. Combettes,et al.  Dualization of Signal Recovery Problems , 2009, 0907.0436.

[27]  Raymond Hemmecke,et al.  Nonlinear Integer Programming , 2009, 50 Years of Integer Programming.

[28]  M. Jünger,et al.  50 Years of Integer Programming 1958-2008 - From the Early Years to the State-of-the-Art , 2010 .

[29]  Jeff T. Linderoth,et al.  MILP Software , 2010 .

[30]  Der-San Chen,et al.  Applied Integer Programming: Modeling and Solution , 2010 .

[31]  A. Sayed,et al.  Foundations and Trends ® in Machine Learning > Vol 7 > Issue 4-5 Ordering Info About Us Alerts Contact Help Log in Adaptation , Learning , and Optimization over Networks , 2011 .

[32]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[33]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[34]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[35]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[36]  S. Burer,et al.  The MILP Road to MIQCP , 2012 .

[37]  Robert E. Bixby,et al.  A Brief History of Linear and Mixed-Integer Programming Computation , 2012 .

[38]  D. Bertsimas,et al.  Least quantile regression via modern optimization , 2013, 1310.8625.

[39]  Yong Zhang,et al.  Sparse Approximation via Penalty Decomposition Methods , 2012, SIAM J. Optim..

[40]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[41]  Daniel Bienstock,et al.  Cutting-Planes for Optimization of Convex Functions over Nonconvex Sets , 2014, SIAM J. Optim..

[42]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[43]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[44]  Juan Pablo Vielma,et al.  Mixed Integer Linear Programming Formulation Techniques , 2015, SIAM Rev..

[45]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[46]  Paul Grigas,et al.  A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives , 2015, ArXiv.

[47]  Runze Li,et al.  GLOBAL SOLUTIONS TO FOLDED CONCAVE PENALIZED NONCONVEX LEARNING. , 2016, Annals of statistics.

[48]  Dimitris Bertsimas,et al.  OR Forum - An Algorithmic Approach to Linear Regression , 2016, Oper. Res..

[49]  Iain Dunning,et al.  Extended formulations in mixed integer conic quadratic programming , 2015, Mathematical Programming Computation.

[50]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .