A new joint screening method for right-censored time-to-event data with ultra-high dimensional covariates

In an ultra-high dimensional setting with a huge number of covariates, variable screening is useful for dimension reduction before applying a more refined method for model selection and statistical analysis. This paper proposes a new sure joint screening procedure for right-censored time-to-event data based on a sparsity-restricted semiparametric accelerated failure time model. Our method, referred to as Buckley-James assisted sure screening (BJASS), consists of an initial screening step using a sparsity-restricted least-squares estimate based on a synthetic time variable and a refinement screening step using a sparsity-restricted least-squares estimate with the Buckley-James imputed event times. The refinement step may be repeated several times to obtain more stable results. We show that with any fixed number of refinement steps, the BJASS procedure retains all important variables with probability tending to 1. Simulation results are presented to illustrate its performance in comparison with some marginal screening methods. Real data examples are provided using a diffuse large-B-cell lymphoma (DLBCL) data and a breast cancer data. We have implemented the BJASS method using Matlab and made it available to readers through Github https://github.com/yiucla/BJASS.

[1]  I. James,et al.  Linear regression with censored data , 1979 .

[2]  Winfried Stute,et al.  Distributional Convergence under Random Censorship when Covariables are Present , 1996 .

[3]  Jing Sun,et al.  Adaptive conditional feature screening , 2016, Comput. Stat. Data Anal..

[4]  Chunlong Zhang,et al.  The Identification of Specific Methylation Patterns across Different Cancers , 2015, PloS one.

[5]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[6]  Jian Huang,et al.  Regularized Estimation in the Accelerated Failure Time Model with High‐Dimensional Covariates , 2006, Biometrics.

[7]  Thomas H. Scheike,et al.  Independent screening for single‐index hazard rate models with ultrahigh dimensional features , 2011, 1105.3361.

[8]  Adrian E. Raftery,et al.  Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data , 2009, BMC Bioinformatics.

[9]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[10]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[11]  Hongzhe Li,et al.  Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data , 2005, Bioinform..

[12]  Karsten M. Borgwardt,et al.  Faculty Opinions recommendation of Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[13]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[14]  Runze Li,et al.  Feature Screening in Ultrahigh Dimensional Cox's Model. , 2016, Statistica Sinica.

[15]  Qi Zheng,et al.  Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes , 2016, Biometrics.

[16]  Yi Li,et al.  Conditional screening for ultra-high dimensional covariates with survival outcomes , 2016, Lifetime data analysis.

[17]  Sihai Dave Zhao,et al.  The Dantzig Selector for Censored Linear Regression Models. , 2014, Statistica Sinica.

[18]  Jialiang Li,et al.  Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis , 2016, Lifetime data analysis.

[19]  Runze Li,et al.  Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates , 2014, Journal of the American Statistical Association.

[20]  D.,et al.  Regression Models and Life-Tables , 2022 .

[21]  Jonathan E. Taylor,et al.  Exact Post Model Selection Inference for Marginal Screening , 2014, NIPS.

[22]  Z. Ying,et al.  On least-squares regression with censored data , 2006 .

[23]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[24]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[25]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[26]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[27]  J. V. Ryzin,et al.  Regression Analysis with Randomly Right-Censored Data , 1981 .

[28]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[29]  Chen Xu,et al.  The Sparse MLE for Ultrahigh-Dimensional Feature Screening , 2014, Journal of the American Statistical Association.

[30]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[31]  Jianqing Fan,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models , 2014, Journal of the American Statistical Association.

[32]  Shuangge Ma,et al.  Censored Rank Independence Screening for High-dimensional Survival Data. , 2014, Biometrika.

[33]  D. Kufe,et al.  Targeting MUC1-C suppresses BCL2A1 in triple-negative breast cancer , 2018, Signal Transduction and Targeted Therapy.

[34]  Jian Huang,et al.  Additive risk survival model with microarray data , 2007, BMC Bioinformatics.

[35]  Chenlei Leng,et al.  Shrinkage tuning parameter selection with a diverging number of parameters , 2008 .

[36]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[37]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[38]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[39]  Z. Ying,et al.  Accelerated failure time models for counting processes , 1998 .

[40]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[41]  Hengjian Cui,et al.  Regularized Quantile Regression and Robust Feature Screening for Single Index Models. , 2016, Statistica Sinica.

[42]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[43]  Yi Wan,et al.  Model free feature screening for ultrahigh dimensional data with responses missing at random , 2017, Comput. Stat. Data Anal..

[44]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[45]  T Cai,et al.  Regularized Estimation for the Accelerated Failure Time Model , 2009, Biometrics.

[46]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[47]  Winfried Stute,et al.  Consistent estimation under random censorship when covariables are present , 1993 .