Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes

Motivated by ultrahigh-dimensional biomarkers screening studies, we propose a model-free screening approach tailored to censored lifetime outcomes. Our proposal is built upon the introduction of a new measure, survival impact index (SII). By its design, SII sensibly captures the overall influence of a covariate on the outcome distribution, and can be estimated with familiar nonparametric procedures that do not require smoothing and are readily adaptable to handle lifetime outcomes under various censoring and truncation mechanisms. We provide large sample distributional results that facilitate the inference on SII in classical multivariate settings. More importantly, we investigate SII as an effective screener for ultrahigh-dimensional data, not relying on rigid regression model assumptions for real applications. We establish the sure screening property of the proposed SII-based screener. Extensive numerical studies are carried out to assess the performance of our method compared with other existing screening methods. A lung cancer microarray data is analyzed to demonstrate the practical utility of our proposals.

[1]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[2]  K. Coombes,et al.  Robust Gene Expression Signature from Formalin-Fixed Paraffin-Embedded Samples Predicts Prognosis of Non–Small-Cell Lung Cancer Patients , 2011, Clinical Cancer Research.

[3]  Brent A. Johnson,et al.  Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models , 2008, Journal of the American Statistical Association.

[4]  Jialiang Li,et al.  Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data , 2013, 1308.3942.

[5]  Jialiang Li,et al.  Time‐dependent ROC analysis under diverse censoring patterns , 2011, Statistics in medicine.

[6]  Shuangge Ma,et al.  Censored Rank Independence Screening for High-dimensional Survival Data. , 2014, Biometrika.

[7]  B. Turnbull The Empirical Distribution Function with Arbitrarily Grouped, Censored, and Truncated Data , 1976 .

[8]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[9]  英敦 塚原 Aad W. van der Vaart and Jon A. Wellner: Weak Convergence and Empirical Processes: With Applications to Statistics, Springer,1996年,xvi + 508ページ. , 2009 .

[10]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[11]  Y. Vardi,et al.  Nonparametric Estimation in the Presence of Length Bias , 1982 .

[12]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[13]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[14]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[15]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[16]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[17]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[18]  Jianqing Fan,et al.  REGULARIZATION FOR COX'S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY. , 2010, Annals of statistics.

[19]  Zhifu Sun,et al.  A Gene Expression Signature Predicts Survival of Patients with Stage I Non-Small Cell Lung Cancer , 2006, PLoS medicine.

[20]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[21]  Jeremy J. W. Chen,et al.  A five-gene signature and clinical outcome in non-small-cell lung cancer. , 2007, The New England journal of medicine.

[22]  A. Földes,et al.  STRONG UNIFORM CONSISTENCY FOR NONPARAMETRIC SURVIVAL CURVE ESTIMATORS FROM RANDOMLY CENSORED DATA , 1981 .

[23]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[24]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[25]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[26]  J. Fine,et al.  Nonparametric Tests for Continuous Covariate Effects with Multistate Survival Data , 2008, Biometrics.

[27]  Nicholas P. Jewell,et al.  Asymptotic Properties of the Product Limit Estimate Under Random Truncation , 1986 .

[28]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[29]  Jian Huang,et al.  Regularized Estimation in the Accelerated Failure Time Model with High‐Dimensional Covariates , 2006, Biometrics.