ExSIS: Extended Sure Independence Screening for Ultrahigh-dimensional Linear Models

Abstract Statistical inference can be computationally prohibitive in ultrahigh-dimensional linear models. Correlation-based variable screening, in which one leverages marginal correlations for removal of irrelevant variables from the model prior to statistical inference, can be used to overcome this challenge. Prior works on correlation-based variable screening either impose statistical priors on the linear model or assume specific post-screening inference methods. This paper first extends the analysis of correlation-based variable screening to arbitrary linear models and post-screening inference techniques. In particular, (i) it shows that a condition—termed the screening condition—is sufficient for successful correlation-based screening of linear models, and (ii) it provides insights into the dependence of marginal correlation-based screening on different problem parameters. Numerical experiments confirm that these insights are not mere artifacts of analysis; rather, they are reflective of the challenges associated with marginal correlation-based variable screening. Second, the paper explicitly derives the screening condition for arbitrary (random or deterministic) linear models and, in the process, it establishes that—under appropriate conditions—it is possible to reduce the dimension of an ultrahigh-dimensional, arbitrary linear model to almost the sample size even when the number of active variables scales almost linearly with the sample size. Third, it specializes the screening condition to sub-Gaussian linear models and contrasts the final results to those existing in the literature. This specialization formally validates the claim that the main result of this paper generalizes existing ones on correlation-based screening.

[1]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[2]  Kristiaan Pelckmans,et al.  An ellipsoid based, two-stage screening test for BPDN , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[3]  Antonio J. Plaza,et al.  Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[4]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[5]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[6]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[7]  E. Gehan,et al.  The properties of high-dimensional data spaces: implications for exploring gene and protein expression data , 2008, Nature Reviews Cancer.

[8]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[9]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[10]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[11]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[12]  Lloyd R. Welch,et al.  Lower bounds on the maximum cross correlation of signals (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Peter J. Ramadge,et al.  Fast lasso screening tests based on correlations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[16]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[17]  A. Robert Calderbank,et al.  Why Gabor frames? Two fundamental measures of coherence and their role in model selection , 2010, Journal of Communications and Networks.

[18]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[19]  Peter Hall,et al.  Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems , 2009 .

[20]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[21]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[22]  Rémi Gribonval,et al.  Sparse representations in unions of bases , 2003, IEEE Trans. Inf. Theory.

[23]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[24]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[25]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[26]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[27]  Larry A. Wasserman,et al.  A Comparison of the Lasso and Marginal Regression , 2012, J. Mach. Learn. Res..

[28]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[29]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[30]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[31]  David A. Landgrebe,et al.  Hyperspectral image data analysis , 2002, IEEE Signal Process. Mag..

[32]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[33]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[34]  Waheed Uz Zaman Bajwa,et al.  Correlation-Based ultrahigh-dimensional variable screening , 2017, 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[35]  Francesco C Stingo,et al.  INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES. , 2011, The annals of applied statistics.

[36]  Laurent El Ghaoui,et al.  Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems , 2010, 1009.4219.

[37]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[38]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[39]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[40]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[41]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[42]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[43]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[44]  M. Rudelson,et al.  The smallest singular value of a random rectangular matrix , 2008, 0802.3956.

[45]  Wei Pan,et al.  Linear regression and two-class classification with gene expression data , 2003, Bioinform..

[46]  J. Potter,et al.  A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. , 2003, Biostatistics.

[47]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[48]  Thibault Helleputte,et al.  Partially supervised feature selection with regularized linear models , 2009, ICML '09.

[49]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Jianqing Fan,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models , 2014, Journal of the American Statistical Association.

[51]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[52]  Dustin G. Mixon,et al.  Two are better than one: Fundamental parameters of frame coherence , 2011, 1103.0435.

[53]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[54]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[55]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[56]  Peter J. Ramadge,et al.  Screening Tests for Lasso Problems , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[58]  R. DeVore,et al.  Compressed sensing and best k-term approximation , 2008 .

[59]  Jon Atli Benediktsson,et al.  Recent Advances in Techniques for Hyperspectral Image Processing , 2009 .

[60]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.