Model-free conditional screening via conditional distance correlation

With the knowledge on the predetermined active predictors, we develop a feature screening procedure via the conditional distance correlation learning. The proposed procedure can significantly lower the correlation among the predictors when they are highly correlated and thus reduce the numbers of false positive and false negative. Meanwhile, when the conditional set is unable to be accessed beforehand, a data-driven method is provided to select it. We establish both the ranking consistency and the sure screening property for the new proposed procedure. To compare the performance of our method with its competitors, extensive simulations are conducted, which shows that the new procedure performs well in both the linear and nonlinear models. Finally, a real data analysis is investigated to further illustrate the effectiveness of the new method.

[1]  Qinqin Hu,et al.  Conditional sure independence screening by conditional marginal empirical likelihood , 2015, Annals of the Institute of Statistical Mathematics.

[2]  Runze Li,et al.  Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates , 2014, Journal of the American Statistical Association.

[3]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[4]  P. Filzmoser,et al.  Erratum to: Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator , 2013, Statistical Papers.

[5]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[6]  Valentin Patilea,et al.  One for All and All for One: Regression Checks With Many Regressors , 2012 .

[7]  G. Tian,et al.  Adaptive group Lasso for high-dimensional generalized linear models , 2019 .

[8]  Yichao Wu,et al.  MARGINAL EMPIRICAL LIKELIHOOD AND SURE INDEPENDENCE FEATURE SCREENING. , 2013, Annals of statistics.

[9]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[10]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[13]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[14]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[15]  Heping Zhang,et al.  Conditional Distance Correlation , 2015, Journal of the American Statistical Association.

[16]  Jianqing Fan,et al.  Conditional Sure Independence Screening , 2012, Journal of the American Statistical Association.

[17]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[18]  Jianqing Fan,et al.  Generalized likelihood ratio statistics and Wilks phenomenon , 2001 .

[19]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[20]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[21]  Dengke Xu,et al.  Variable selection in high-dimensional double generalized linear models , 2014 .

[22]  Lixing Zhu,et al.  Nonparametric feature screening , 2013, Comput. Stat. Data Anal..

[23]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[24]  Jing Sun,et al.  Adaptive conditional feature screening , 2016, Comput. Stat. Data Anal..

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .