A note on quantile feature screening via distance correlation

In this paper, we propose a new feature screening procedure based on a robust quantile version of distance correlation with some desirable characters. First, it is particularly useful for data exhibiting heterogeneity, which is very common for high dimensional data. Second, it is robust to model misspecification and behaves reliably when some of features contain outliers or follow heavy-tailed distributions. Under very mild conditions, we have established its sure screening property. In practice, a same index set is often found to be adequate by the quantile analysis. So we furthermore present a composite robust quantile version of distance correlation to perform feature screening. Simulation studies are carried out to examine the performance of advised procedures. We also illustrate them by a real data example.

[1]  H. Zou,et al.  Composite quantile regression and the oracle Model Selection Theory , 2008, 0806.2905.

[2]  Xuejun Ma,et al.  Robust model-free feature screening via quantile correlation , 2016, J. Multivar. Anal..

[3]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[4]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[7]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[8]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[9]  Kam D. Dahlquist,et al.  Regression Approaches for Microarray Data Analysis , 2002, J. Comput. Biol..

[10]  Runze Li,et al.  Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates , 2014, Journal of the American Statistical Association.

[11]  Hans A. Kestler,et al.  Proceedings of Reisensburg 2013 , 2015 .

[12]  Jianqing Fan,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models , 2014, Journal of the American Statistical Association.

[13]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[14]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[15]  Zhiping Lu,et al.  Quantile-adaptive variable screening in ultra-high dimensional varying coefficient models , 2016 .

[16]  Dengke Xu,et al.  Variable selection in high-dimensional double generalized linear models , 2014 .

[17]  Lixing Zhu,et al.  Nonparametric feature screening , 2013, Comput. Stat. Data Anal..

[18]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[19]  Peter Hall,et al.  Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems , 2009 .

[20]  Jialiang Li,et al.  Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data , 2013, 1308.3942.

[21]  Guosheng Yin,et al.  Conditional quantile screening in ultrahigh-dimensional heterogeneous data , 2015 .

[22]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[23]  Xuming He,et al.  A Lack-of-Fit Test for Quantile Regression , 2003 .

[24]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[25]  Xiaoli Gao A flexible shrinkage operator for fussy grouped variable selection , 2018 .

[26]  Yang Li,et al.  Quantile Correlations and Quantile Autoregressive Modeling , 2012, 1209.6487.

[27]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[28]  Liping Zhu,et al.  An iterative approach to distance correlation-based sure independence screening† , 2015 .

[29]  B. Conklin,et al.  Conditional expression and signaling of a specifically designed Gi-coupled receptor in transgenic mice , 1999, Nature Biotechnology.

[30]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[31]  Hengjian Cui,et al.  Regularized Quantile Regression and Robust Feature Screening for Single Index Models. , 2016, Statistica Sinica.

[32]  Xiaofeng Shao,et al.  Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening , 2014 .