Variable Screening via Quantile Partial Correlation

ABSTRACT In quantile linear regression with ultrahigh-dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented. Supplementary materials for this article are available online.

[1]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[2]  Naomi S. Altman,et al.  Quantile regression , 2019, Nature Methods.

[3]  Victor Chernozhukov,et al.  Quantile Regression Under Misspecification, with an Application to the U.S. Wage Structure , 2004 .

[4]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[5]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[6]  Guosheng Yin,et al.  Conditional quantile screening in ultrahigh-dimensional heterogeneous data , 2015 .

[7]  Jeffrey S. Morris,et al.  Sure independence screening for ultrahigh dimensional feature space Discussion , 2008 .

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Yang Li,et al.  Quantile Correlations and Quantile Autoregressive Modeling , 2012, 1209.6487.

[10]  Hohsuk Noh,et al.  Model Selection via Bayesian Information Criterion for Quantile Regression Models , 2014 .

[11]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[12]  Runze Li,et al.  Quantile Regression for Analyzing Heterogeneity in Ultra-High Dimension , 2012, Journal of the American Statistical Association.

[13]  H. Zou,et al.  Regression Shrinkage and Selection via the Elastic Net , with Applications to Microarrays , 2003 .

[14]  Hengjian Cui,et al.  Model-Free Feature Screening for Ultrahigh , 2014 .

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[17]  R. Dionne,et al.  Regulation of gene expression , 2004 .

[18]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[19]  D. Pollard,et al.  Asymptotics for minimisers of convex processes , 2011, 1107.3806.

[20]  Runze Li,et al.  Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates , 2014, Journal of the American Statistical Association.

[21]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[22]  R. Koenker Quantile Regression: Name Index , 2005 .

[23]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[24]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[25]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[26]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[27]  M. Maathuis,et al.  Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm , 2009, 0906.3204.

[28]  P. Fryzlewicz,et al.  High dimensional variable selection via tilting , 2012, 1611.08640.

[29]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[30]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[31]  Chenlei Leng,et al.  Shrinkage tuning parameter selection with a diverging number of parameters , 2008 .

[32]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[33]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[34]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[35]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[36]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[37]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[38]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[39]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[40]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[41]  Adrian Baddeley,et al.  Systematic sampling with errors in sample locations , 2010 .