Partition-based ultrahigh-dimensional variable screening.

Traditional variable selection methods are compromised by overlooking useful information on covariates with similar functionality or spatial proximity, and by treating each covariate independently. Leveraging prior grouping information on covariates, we propose partition-based screening methods for ultrahigh-dimensional variables in the framework of generalized linear models. We show that partition-based screening exhibits the sure screening property with a vanishing false selection rate, and we propose a data-driven partition screening framework with unavailable or unreliable prior knowledge on covariate grouping and investigate its theoretical properties. We consider two special cases: correlation-guided partitioning and spatial location- guided partitioning. In the absence of a single partition, we propose a theoretically justified strategy for combining statistics from various partitioning methods. The utility of the proposed methods is demonstrated via simulation and analysis of functional neuroimaging data.

[1]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[2]  C. Fiebach,et al.  The role of left inferior frontal and superior temporal cortex in sentence comprehension: localizing syntactic and semantic processes. , 2003, Cerebral cortex.

[3]  Gerald Langner,et al.  The oscillating brain , 2015 .

[4]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[5]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[6]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[7]  Chaozhe Zhu,et al.  An improved approach to detection of amplitude of low-frequency fluctuation (ALFF) for resting-state fMRI: Fractional ALFF , 2008, Journal of Neuroscience Methods.

[8]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[9]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[10]  Qi Zheng,et al.  Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes , 2016, Biometrics.

[11]  Xiao Wang,et al.  Generalized Scalar-on-Image Regression Models via Total Variation , 2017, Journal of the American Statistical Association.

[12]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[13]  Bharat B. Biswal,et al.  The oscillating brain: Complex and reliable , 2010, NeuroImage.

[14]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[15]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[16]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[17]  Ikuko Mukai,et al.  A role of right middle frontal gyrus in reorienting of attention: a case study , 2015, Front. Syst. Neurosci..

[18]  Daniel P. Kennedy,et al.  The Autism Brain Imaging Data Exchange: Towards Large-Scale Evaluation of the Intrinsic Brain Architecture in Autism , 2013, Molecular Psychiatry.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Vince D. Calhoun,et al.  A review of multivariate analyses in imaging genetics , 2014, Front. Neuroinform..

[21]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis , 2015, Journal of the American Statistical Association.

[22]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[23]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[24]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[25]  Chenlei Leng,et al.  High dimensional ordinary least squares projection for screening variables , 2015, 1506.01782.

[26]  Jianqing Fan,et al.  Conditional Sure Independence Screening , 2012, Journal of the American Statistical Association.

[27]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[28]  Jianqing Fan,et al.  COVARIANCE ASSISTED SCREENING AND ESTIMATION. , 2014, Annals of statistics.

[29]  Thomas H. Scheike,et al.  Independent screening for single‐index hazard rate models with ultrahigh dimensional features , 2011, 1105.3361.

[30]  Qi Zhang,et al.  Optimality of graphlet screening in high dimensional variable selection , 2012, J. Mach. Learn. Res..

[31]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[32]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[33]  P. Fryzlewicz,et al.  High dimensional variable selection via tilting , 2012, 1611.08640.

[34]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[35]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[36]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[37]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[38]  Bernard Mazoyer,et al.  Disentangling the brain networks supporting affective speech comprehension , 2012, NeuroImage.

[39]  Ning Hao,et al.  Detection of rare functional variants using group ISIS , 2011, BMC proceedings.

[40]  Yi Li,et al.  Conditional screening for ultra-high dimensional covariates with survival outcomes , 2016, Lifetime data analysis.

[41]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[42]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[43]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[44]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[45]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[46]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[47]  Peter Hall,et al.  Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems , 2009 .

[48]  Lan Wang,et al.  A data‐driven approach to conditional screening of high‐dimensional variables , 2016 .