Dynamic tilted current correlation for high dimensional variable screening

Abstract Variable screening is a commonly used procedure in high dimensional data analysis to reduce dimensionality and ensure the applicability of available statistical methods. Such a procedure is complicated and computationally burdensome because spurious correlations commonly exist among predictor variables, while important predictor variables may not have large marginal correlations with the response variable. To circumvent these issues, in this paper, we develop a new screening technique, the “dynamic tilted current correlation screening” (DTCCS), for high dimensional variable screening. DTCCS is capable of selecting the most relevant predictors within a finite number of steps, and takes the popularly used sure independence screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases. The DTCCS technique has sure screening and consistency properties which are justified theoretically and demonstrated numerically. A real example of gene expression data is analyzed using the proposed DTCCS procedure.

[1]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[2]  Gerard Brady,et al.  Routine expression profiling of microarray gene signatures in acute leukaemia by real‐time PCR of human bone marrow * , 2005, British journal of haematology.

[3]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[4]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[5]  Jerzy W. Grzymala-Busse,et al.  Leukemia Prediction from Gene Expression Data-A Rough Set Approach , 2006, ICAISC.

[6]  Chenlei Leng,et al.  High dimensional ordinary least squares projection for screening variables , 2015, 1506.01782.

[7]  Xinwei Deng,et al.  Estimation in high-dimensional linear models with deterministic design matrices , 2012, 1206.0847.

[8]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[9]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[10]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[11]  Hai-Long Wu,et al.  Variable selection using probability density function similarity for support vector machine classification of high-dimensional microarray data. , 2009, Talanta.

[12]  Jianhua Hu,et al.  Panel Data Partially Linear Varying-Coefficient Model with Both Spatially and Time-Wise Correlated Errors , 2014 .

[13]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[14]  P. Fryzlewicz,et al.  High dimensional variable selection via tilting , 2012, 1611.08640.

[15]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[16]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[17]  Lixing Zhu,et al.  NONCONCAVE PENALIZED M-ESTIMATION WITH A DIVERGING NUMBER OF PARAMETERS , 2011 .

[18]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[19]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[20]  Yongdai Kim,et al.  Consistent model selection criteria for quadratically supported risks , 2016 .

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[23]  Hansheng Wang,et al.  Factor profiled sure independence screening , 2012 .

[24]  Jianhua Hu,et al.  Improved estimation of fixed effects panel data partially linear models with heteroscedastic errors , 2017, J. Multivar. Anal..

[25]  Fuxiang Liu,et al.  Panel data partially linear model with fixed effects, spatial autoregressive error components and unspecified intertemporal correlation , 2014, J. Multivar. Anal..

[26]  Sijian Wang,et al.  RANDOM LASSO. , 2011, The annals of applied statistics.

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[29]  Xin Xin,et al.  Model determination and estimation for the growth curve model via group SCAD penalty , 2014, J. Multivar. Anal..

[30]  R. C. Thompson Principal submatrices IX: Interlacing inequalities for singular values of submatrices , 1972 .

[31]  R. Tibshirani,et al.  REJOINDER TO "LEAST ANGLE REGRESSION" BY EFRON ET AL. , 2004, math/0406474.

[32]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.