High-dimensional variable selection with heterogeneous signals: A precise asymptotic perspective

We study the problem of exact support recovery for high-dimensional sparse linear regression under independent Gaussian design when the signals are weak, rare, and possibly heterogeneous. Under a suitable scaling of the sample size and signal sparsity, we fix the minimum signal magnitude at the information-theoretic optimal rate and investigate the asymptotic selection accuracy of best subset selection (BSS) and marginal screening (MS) procedures. We show that despite the ideal setup, somewhat surprisingly, marginal screening can fail to achieve exact recovery with probability converging to one in the presence of heterogeneous signals, whereas BSS enjoys model consistency whenever the minimum signal strength is above the information-theoretic threshold. To mitigate the computational intractability of BSS, we also propose an efficient two-stage algorithmic framework called ETS (Estimate Then Screen) comprised of an estimation step and gradient coordinate screening step, and under the same scaling assumption on sample size and sparsity, we show that ETS achieves model consistency under the same information-theoretic optimal requirement on the minimum signal strength as BSS. Finally, we present a simulation study comparing ETS with LASSO and marginal screening. The numerical results agree with our asymptotic theory even for realistic values of the sample size, dimension and sparsity.

[1]  Jin Zhu,et al.  A polynomial algorithm for best-subset selection problem , 2020, Proceedings of the National Academy of Sciences.

[2]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[3]  Min Chen,et al.  Asset selection based on high frequency Sharpe ratio , 2020 .

[4]  Jianqing Fan,et al.  Best subset selection is robust against design dependence , 2020, 2007.01478.

[5]  Jianqing Fan,et al.  When is best subset selection the "best"? , 2020 .

[6]  Weijie J. Su,et al.  The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions , 2020, IEEE Transactions on Information Theory.

[7]  R. Mazumder,et al.  Sparse regression at scale: branch-and-bound rooted in first-order optimization , 2020, Mathematical Programming.

[8]  Runze Li,et al.  Prioritizing genetic variants in GWAS with lasso using permutation-assisted tuning , 2020, Bioinform..

[9]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[10]  Tzu-Jung Huang,et al.  Marginal screening for high-dimensional predictors of survival outcomes. , 2019, Statistica Sinica.

[11]  Hyokyoung G Hong,et al.  Weak signals in high-dimension regression: detection, estimation and prediction. , 2019, Applied stochastic models in business and industry.

[12]  Yury Polyanskiy,et al.  Fundamental Limits of Many-User MAC With Finite Payloads and Fading , 2019, IEEE Transactions on Information Theory.

[13]  Alexandre B. Tsybakov,et al.  Optimal Variable Selection and Adaptive Noisy Compressed Sensing , 2018, IEEE Transactions on Information Theory.

[14]  Weijun Xie,et al.  Scalable Algorithms for the Sparse Ridge Regression , 2018, SIAM J. Optim..

[15]  Haoyang Liu,et al.  Between hard and soft thresholding: optimal iterative thresholding algorithms , 2018, Information and Inference: A Journal of the IMA.

[16]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[17]  Tuo Zhao,et al.  On Fast Convergence of Proximal Algorithms for SQRT-Lasso Optimization: Don't Worry About its Nonsmooth Loss Function , 2016, UAI.

[18]  Yi Li,et al.  Conditional screening for ultra-high dimensional covariates with survival outcomes , 2016, Lifetime data analysis.

[19]  Qi Zheng,et al.  Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes , 2016, Biometrics.

[20]  Emmanuel J. Candès,et al.  False Discoveries Occur Early on the Lasso Path , 2015, ArXiv.

[21]  Yao Wang,et al.  Model selection and estimation in high dimensional regression models with group SCAD , 2015 .

[22]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[23]  Jianqing Fan,et al.  ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS. , 2015, Annals of statistics.

[24]  Volkan Cevher,et al.  Limits on Support Recovery With Probabilistic Models: An Information-Theoretic Framework , 2015, IEEE Transactions on Information Theory.

[25]  Tuo Zhao,et al.  Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory , 2014, ArXiv.

[26]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[27]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[28]  Jinchi Lv,et al.  High dimensional thresholded regression and shrinkage effect , 2014, 1605.03306.

[29]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[30]  Runze Li,et al.  Feature Screening for Ultrahigh Dimensional Categorical Data With Applications , 2013, Journal of business & economic statistics : a publication of the American Statistical Association.

[31]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[32]  Kengo Kato,et al.  Gaussian approximation of suprema of empirical processes , 2012, 1212.6885.

[33]  Jukka Corander,et al.  Genome-wide association studies with high-dimensional phenotypes , 2012, Statistical applications in genetics and molecular biology.

[34]  Qi Zhang,et al.  Optimality of graphlet screening in high dimensional variable selection , 2012, J. Mach. Learn. Res..

[35]  Lin Xiao,et al.  A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem , 2012, SIAM J. Optim..

[36]  Bhaskar D. Rao,et al.  Limits on Support Recovery of Sparse Signals via Multiple-Access Communication Techniques , 2011, IEEE Transactions on Information Theory.

[37]  Martin J. Wainwright,et al.  Fast global convergence of gradient methods for high-dimensional statistical recovery , 2011, ArXiv.

[38]  Jiashun Jin,et al.  UPS delivers optimal phase diagram in high-dimensional variable selection , 2010, 1010.5028.

[39]  E. Candès,et al.  Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism , 2010, 1007.1434.

[40]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[41]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[42]  Kamiar Rahnama Rad Nearly Sharp Sufficient Conditions on Exact Sparsity Pattern Recovery , 2009, IEEE Transactions on Information Theory.

[43]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[44]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[45]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[46]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparse Signal Recovery: Dense versus Sparse Measurement Matrices , 2008, IEEE Transactions on Information Theory.

[47]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[48]  Venkatesh Saligrama,et al.  Information Theoretic Bounds for Compressed Sensing , 2008, IEEE Transactions on Information Theory.

[49]  Sundeep Rangan,et al.  Necessary and Sufficient Conditions for Sparsity Pattern Recovery , 2008, IEEE Transactions on Information Theory.

[50]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[51]  Vahid Tarokh,et al.  Shannon-Theoretic Limits on Noisy Compressive Sampling , 2007, IEEE Transactions on Information Theory.

[52]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[53]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[54]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[55]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[56]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[57]  Wenbin Lu Marginal Regression of Multivariate Event Times Based on Linear Transformation Models , 2005, Lifetime data analysis.

[58]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[59]  E. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[60]  J. Robins,et al.  Uniform consistency in causal inference , 2003 .

[61]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[62]  W Leisenring,et al.  A marginal regression modelling framework for evaluating medical diagnostic tests. , 1997, Statistics in medicine.

[63]  R. D. Gordon Values of Mills' Ratio of Area to Bounding Ordinate and of the Normal Probability Integral for Large Values of the Argument , 1941 .

[64]  Shihao Wu,et al.  On the early solution path of best subset selection , 2021 .

[65]  Trevor Hastie,et al.  Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons , 2020 .

[66]  Larry A. Wasserman,et al.  A Comparison of the Lasso and Marginal Regression , 2012, J. Mach. Learn. Res..

[67]  Gitta Kutyniok Compressed Sensing , 2012 .

[68]  Jean-Paul Gourlot,et al.  Use of results , 2008 .

[69]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[70]  Mario Bertero,et al.  The Stability of Inverse Problems , 1980 .