High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a “no panacea” view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

[1]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[2]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[3]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[4]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[5]  Peter Bühlmann,et al.  High-dimensional variable screening and bias in subsequent inference, with an empirical comparison , 2013, Computational Statistics.

[6]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[7]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[10]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[11]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[12]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[13]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[14]  Brian J Reich,et al.  Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions , 2012, Journal of the American Statistical Association.

[15]  Bin Yu,et al.  Estimation Stability With Cross-Validation (ESCV) , 2013, 1303.3128.

[16]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[17]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[18]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[19]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[20]  Shiliang Sun,et al.  Discussion of ‘ Stability Selection ’ , by Nicolai Meinshausen and Peter Bühlmann , 2010 .

[21]  N. Meinshausen,et al.  Discussion: A tale of three cousins: Lasso, L2Boosting and Dantzig , 2007, 0803.3134.

[22]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[23]  R. Tibshirani,et al.  Discussion: The Dantzig selector: Statistical estimation when p is much larger than n , 2007, 0803.3126.

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  Sach Mukherjee,et al.  Scalable Bayesian Regression in High Dimensions With Multiple Data Sources , 2017, Journal of Computational and Graphical Statistics.

[26]  Xiaoming Yuan,et al.  The flare package for high dimensional linear regression and precision matrix estimation in R , 2020, J. Mach. Learn. Res..

[27]  Martin Sill,et al.  c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models , 2014 .

[28]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[29]  J WainwrightMartin Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso) , 2009 .

[30]  Simone Villa,et al.  Learning Continuous Time Bayesian Network Classifiers Using MapReduce , 2014 .

[31]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[32]  Jean-Michel Marin,et al.  Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation , 2010, 1010.0300.

[33]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[34]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[35]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[36]  R. Tibshirani,et al.  Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso , 2017, 1707.08692.

[37]  Bulent Ozpolat,et al.  Molecular Biomarkers of Residual Disease after Surgical Debulking of High-Grade Serous Ovarian Cancer , 2014, Clinical Cancer Research.

[38]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[39]  Gareth M. James,et al.  DASSO: connections between the Dantzig selector and lasso , 2009 .