Discussion on: Sparse regression: Scalable algorithms and empirical performance & Best Subset, Forward Stepwise, or Lasso? Analysis and recommendations based on extensive comparisons

We congratulate the authors Bertsimas, Pauphilet and van Parys (hereafter BPvP) and Hastie, Tibshirani and Tibshirani (hereafter HTT) for providing fresh and insightful views on the problem of variable selection and prediction in linear models. Their contributions at the fundamental level provide guidance for more complex models and procedures.

[1]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[2]  Dimitris Bertsimas,et al.  Algorithm for cardinality-constrained quadratic optimization , 2009, Comput. Optim. Appl..

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[7]  Lie Wang,et al.  Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise , 2011, IEEE Transactions on Information Theory.

[8]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[9]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[10]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  N. Meinshausen,et al.  Spectral Deconfounding via Perturbed Sparse Linear Models , 2018, 1811.05352.

[13]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[14]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[15]  Marc Hofmann,et al.  Efficient algorithms for computing the best subset regression models for large-scale problems , 2007, Comput. Stat. Data Anal..

[16]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[17]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[18]  Dimitris Bertsimas,et al.  Characterization of the equivalence of robustification and regularization in linear and matrix regression , 2017, Eur. J. Oper. Res..

[19]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[20]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[21]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[22]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[23]  Liam Paninski,et al.  Fast online deconvolution of calcium imaging data , 2016, PLoS Comput. Biol..

[24]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[25]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[26]  Dimitris Bertsimas,et al.  Logistic Regression: From Art to Science , 2017 .

[27]  N. Meinshausen,et al.  Anchor regression: Heterogeneous data meet causality , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[28]  Erricos John Kontoghiorghes,et al.  A branch and bound algorithm for computing the best subset regression models , 2002 .

[29]  Daniela Witten,et al.  EXACT SPIKE TRAIN INFERENCE VIA ℓ0 OPTIMIZATION. , 2017, The annals of applied statistics.

[30]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[31]  Tong Zhang,et al.  Sparse Recovery With Orthogonal Matching Pursuit Under RIP , 2010, IEEE Transactions on Information Theory.

[32]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[33]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[34]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[35]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[36]  Jean-Jacques Fuchs,et al.  On sparse representations in arbitrary redundant bases , 2004, IEEE Transactions on Information Theory.

[37]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[38]  Yudong Chen,et al.  Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation: Recent Theory and Fast Algorithms via Convex and Nonconvex Optimization , 2018, IEEE Signal Processing Magazine.

[39]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[40]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[41]  Dimitris Bertsimas,et al.  Certifiably Optimal Low Rank Factor Analysis , 2016, J. Mach. Learn. Res..

[42]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[43]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[44]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[45]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[46]  S. Geer,et al.  Statistics for big data: A perspective , 2018 .

[47]  A. Atkinson Subset Selection in Regression , 1992 .