Joint model-free feature screening for ultra-high dimensional semi-competing risks data

High-dimensional semi-competing risks data consisting of two probably correlated events, namely terminal event and non-terminal event, arise commonly in many biomedical studies. However, the corresponding statistical analysis is rarely investigated. A joint model-free feature screening procedure for both terminal and non-terminal events is proposed, which could allow the associated covariates to be in an ultra-high dimensional feature space. The joint screening utility is constructed from distance correlation between each predictor’s survival function and joint survival function of terminal and non-terminal events. Under rather mild technical assumptions, it is demonstrated that the proposed joint feature screening procedure enjoys sure screening and consistency in ranking properties. An adaptive threshold rule is further suggested to simultaneously identify important covariates and determine number of these covariates. Extensive numerical studies are conducted to examine the finite-sample performance of the proposed methods. Lastly, the suggested joint feature screening procedure is illustrated through a real example.

[1]  Jason P. Fine,et al.  On semi-competing risks data , 2001 .

[2]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[3]  G. Wahba,et al.  Using distance covariance for improved variable selection with application to learning genetic risk models , 2015, Statistics in medicine.

[4]  David C Christiani,et al.  Integrated powered density: Screening ultrahigh dimensional covariates with survival outcomes , 2018, Biometrics.

[5]  Zhiqiang Wang,et al.  Fused mean-variance filter for feature screening , 2016, Comput. Stat. Data Anal..

[6]  Yi Li,et al.  Semiparametric transformation models for semicompeting survival data , 2014, Biometrics.

[7]  Pascal Massart,et al.  A Dvoretzky-Kiefer-Wolfowitz type inequality for the Kaplan-Meier estimator , 1999 .

[8]  Zhixuan Fu,et al.  Penalized variable selection in competing risks regression , 2017, Lifetime data analysis.

[9]  Limin Peng,et al.  Regression Modeling of Semicompeting Risks Data , 2007, Biometrics.

[10]  Xiaolin Chen,et al.  A simple model-free survival conditional feature screening , 2019, Statistics & Probability Letters.

[11]  Hong Wang,et al.  Robust feature screening for ultra-high dimensional right censored data via distance correlation , 2018, Comput. Stat. Data Anal..

[12]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[13]  Runze Li,et al.  Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates , 2014, Journal of the American Statistical Association.

[14]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[15]  J. Kiefer,et al.  Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[16]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[17]  Hengjian Cui,et al.  Regularized Quantile Regression and Robust Feature Screening for Single Index Models. , 2016, Statistica Sinica.

[18]  Yi Liu,et al.  A note on quantile feature screening via distance correlation , 2019 .

[19]  Zhiliang Ying,et al.  A simple nonparametric estimator of the bivariate survival function under univariate censoring , 1993 .

[20]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[21]  J. Kiefer On large deviations of the empiric D. F. of vector chance variables and a law of the iterated logarithm. , 1961 .

[22]  Limin Peng,et al.  Quantile regression adjusting for dependent censoring from semicompeting risks , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[23]  Liping Zhu,et al.  An iterative approach to distance correlation-based sure independence screening† , 2015 .

[24]  Adaptive model-free sure independence screening , 2017 .

[25]  Yi Wan,et al.  Model free feature screening for ultrahigh dimensional data with responses missing at random , 2017, Comput. Stat. Data Anal..

[26]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[27]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[28]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[29]  Liping Zhu,et al.  Model-free feature screening for ultrahigh dimensional censored regression , 2017, Stat. Comput..

[30]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[31]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[32]  Belkacem Abdous,et al.  Estimating Survival and Association in a Semicompeting Risks Model , 2008, Biometrics.