Let me begin by congratulating the authors of these two papers, hereafter HTT and BPV, for their superb contributions to the comparisons of methods for variable selection problems in high dimensional regression. The methods considered are truly some of today’s leading contenders for coping with the size and complexity of big data problems of so much current importance. Not surprisingly, there is no clear winner here because the terrain of comparisons is so vast and complex, and no single method can dominate across all situations. The considered setups vary greatly in terms of the number of observations n, the number of predictors p, the number and relative sizes of the underlying nonzero regression coefficients, predictor correlation structures and signal-to-noise ratios (SNRs). And even these only scratch the surface of the infinite possibilities. Further, there is the additional issue as to which performance measure is most important. Is the goal of an analysis exact variable selection or prediction or both? And what about computational speed and scalability? All these considerations would naturally depend on the practical application at hand. The methods compared by HTT and BPV have been unleashed by extraordinary developments in computational speed, and so it is tempting to distinguish them primarily by their novel implementation algorithms. In particular, the recent integer optimization related algorithms for variable selection differ in fundamental ways from the now widely adopted coordinate ascent algorithms for the lasso related methods. Undoubtedly, the impressive improvements in computational speed unleashed by these algorithms are critical for the feasibility of practical applications. However, the more fundamental story behind the performance differences has to do with the differences between the criteria that their algorithms are seeking to optimize. In an important sense, they are being guided by different solutions to the general variable selection problem. Focusing first on the paper of HTT, its main thrust appears to have been kindled by the computational breakthrough of Bertsimas, King and Mazumder (2016) (hereafter BKM), which had proposed a mixed integer opti-
Dimitris Bertsimas,et al.
Sparse classification: a scalable discrete optimization perspective
Machine Learning.
Arthur E. Hoerl,et al.
Ridge Regression: Biased Estimation for Nonorthogonal Problems
C. Stein,et al.
Estimation with Quadratic Loss
Harrison H. Zhou,et al.
Minimax estimation with thresholding and its application to wavelet analysis
Bart P. G. Van Parys,et al.
Sparse high-dimensional regression: Exact scalable algorithms and phase transitions
The Annals of Statistics.
D. Bertsimas,et al.
Best Subset Selection via a Modern Optimization Lens
E. George,et al.
The Spike-and-Slab LASSO
H. H. Ku,et al.
Contributions to Probability and Statistics, Essays in Honor of Harold Hotelling.