Conditional quantile correlation learning for ultrahigh dimensional varying coefficient models and its application in survival analysis

In this paper, we consider a robust approach to the ultrahigh dimensional variable screening under varying coefficient models. While the existing works focusing on the mean regression function, we propose a procedure based on conditional quantile correlation sure independence screening (CQCSIS). This proposal is applicable to heterogeneous or heavy-tailed data in general and is invariant to monotone transformation of the response. Furthermore, we generalize such a screening procedure to address censored lifetime data through inverse probability weighting. The CQCSIS can be easily implemented, due to an application of nonparametric B-spline approximation, and computed much faster than the kernel based screening method. Under some regularity conditions, we establish sure screening properties including screening consistency and ranking consistency for proposed approaches. We also attempt to construct a two-stage variable selection procedure for a further improvement of performance of CQCSIS based on a group SCAD penalization. Extensive simulation examples and data applications are presented for illustration.

[1]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[2]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[3]  Naomi S. Altman,et al.  Quantile regression , 2019, Nature Methods.

[4]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[5]  Jianhua Z. Huang,et al.  Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements , 2008, Journal of the American Statistical Association.

[6]  Jialiang Li,et al.  Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis , 2016, Lifetime data analysis.

[7]  Shuangge Ma,et al.  Censored Rank Independence Screening for High-dimensional Survival Data. , 2014, Biometrika.

[8]  Feng Yi,et al.  On Varying-coefficient Independence Screening for High-dimensional Varying-coefficient Models. , 2014, Statistica Sinica.

[9]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[10]  B. Efron The two sample problem with censored data , 1967 .

[11]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[12]  Ingrid Van Keilegom,et al.  Variable selection of varying coefficient models in quantile regression , 2012 .

[13]  Qi Zheng,et al.  Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes , 2016, Biometrics.

[14]  Jialiang Li,et al.  Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data , 2013, 1308.3942.

[15]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[16]  Guosheng Yin,et al.  Conditional quantile screening in ultrahigh-dimensional heterogeneous data , 2015 .

[17]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[18]  Toshio Honda,et al.  Forward Variable Selection for Sparse Ultra-High Dimensional Varying Coefficient Models , 2014, 1410.6556.

[19]  Huixia Judy Wang,et al.  Variable selection in quantile varying coefficient models with longitudinal data , 2013, Comput. Stat. Data Anal..

[20]  Jianqing Fan,et al.  Statistical Methods with Varying Coefficient Models. , 2008, Statistics and its interface.

[21]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[22]  Xuejun Ma,et al.  Robust model-free feature screening via quantile correlation , 2016, J. Multivar. Anal..

[23]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[24]  Runze Li,et al.  Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates , 2014, Journal of the American Statistical Association.

[25]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[26]  Lan Wang,et al.  Locally Weighted Censored Quantile Regression , 2009 .

[27]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[28]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[29]  Jialiang Li,et al.  Feature screening for generalized varying coefficient models with application to dichotomous responses , 2016, Comput. Stat. Data Anal..

[30]  L. Iezzoni Assessing Quality Using Administrative Data , 1997, Annals of Internal Medicine.

[31]  Hui Zou,et al.  The fused Kolmogorov filter: A nonparametric model-free screening method , 2014, 1403.7701.

[32]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[33]  Hanieh Panahi,et al.  Model Selection Test for the Heavy-Tailed Distributions under Censored Samples with Application in Financial Data , 2016 .

[34]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[35]  Yingcun Xia,et al.  Shrinkage Estimation of the Varying Coefficient Model , 2008 .

[36]  Yang Li,et al.  Quantile Correlations and Quantile Autoregressive Modeling , 2012, 1209.6487.

[37]  Hohsuk Noh,et al.  Model Selection via Bayesian Information Criterion for Quantile Regression Models , 2014 .