Marginal variable screening for survival endpoints

When performing survival analysis in very high dimensions, it is often required to reduce the number of covariates using preliminary screening. During the last years, a large number of variable screening methods for the survival context have been developed. However, guidance is missing for choosing an appropriate method in practice. The aim of this work is to provide an overview of marginal variable screening methods for survival and develop recommendations for their use. For this purpose, a literature review is given, offering a comprehensive and structured introduction to the topic. In addition, a novel screening procedure based on distance correlation and martingale residuals is proposed, which is particularly useful in detecting nonmonotone associations. For evaluating the performance of the discussed approaches, a simulation study is conducted, comparing the true positive rates of competing variable screening methods in different settings. A real data example on mantle cell lymphoma is provided.

[1]  H. Tian,et al.  Twenty-gene-based prognostic model predicts lung adenocarcinoma survival , 2018, OncoTargets and therapy.

[2]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[3]  Hengjian Cui,et al.  Regularized Quantile Regression and Robust Feature Screening for Single Index Models. , 2016, Statistica Sinica.

[4]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[5]  Qi Zheng,et al.  Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes , 2016, Biometrics.

[6]  Yi Li,et al.  Conditional screening for ultra-high dimensional covariates with survival outcomes , 2016, Lifetime data analysis.

[7]  Xiaoming Huo,et al.  Fast Computing for Distance Covariance , 2014, Technometrics.

[8]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[9]  Zhenglu Wang,et al.  A six‐gene‐based prognostic signature for hepatocellular carcinoma overall survival prediction , 2018, Life sciences.

[10]  J. V. Ryzin,et al.  Regression Analysis with Randomly Right-Censored Data , 1981 .

[11]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[12]  Andrea Sottoriva,et al.  The shaping and functional consequences of the microRNA landscape in breast cancer , 2013, Nature.

[13]  Claude Preudhomme,et al.  A 17-gene stemness score for rapid determination of risk in acute leukaemia , 2016, Nature.

[14]  P. Grambsch,et al.  Martingale-based residuals for survival models , 1990 .

[15]  David B Allison,et al.  Publishes Results of a Wide Variety of Studies from Human and from Informative Model Systems with Physiological Genomics , 2008 .

[16]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[17]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[18]  Xiaoyu He,et al.  Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model , 2016, BMC Genomics.

[19]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[20]  Alessio Farcomeni,et al.  Robust estimation for the Cox regression model based on trimming , 2011, Biometrical journal. Biometrische Zeitschrift.

[21]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[22]  Jeremy J. W. Chen,et al.  A five-gene signature and clinical outcome in non-small-cell lung cancer. , 2007, The New England journal of medicine.

[23]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[24]  Yanyan Liu,et al.  Correlation rank screening for ultrahigh-dimensional survival data , 2017, Comput. Stat. Data Anal..

[25]  Shuangge Ma,et al.  Censored Rank Independence Screening for High-dimensional Survival Data. , 2014, Biometrika.

[26]  D. Hose,et al.  Concomitant gain of 1q21 and MYC translocation define a poor prognostic subgroup of hyperdiploid multiple myeloma , 2016, Haematologica.

[27]  L. Staudt,et al.  The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. , 2003, Cancer cell.

[28]  David C Christiani,et al.  Integrated powered density: Screening ultrahigh dimensional covariates with survival outcomes , 2018, Biometrics.

[29]  Yi Li,et al.  Score test variable screening , 2014, Biometrics.

[30]  Liping Zhu,et al.  Model-free feature screening for ultrahigh dimensional censored regression , 2017, Stat. Comput..

[31]  Eric Y. Chuang,et al.  Concurrent Gene Signatures for Han Chinese Breast Cancers , 2013, PloS one.

[32]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[33]  P. Royston,et al.  Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. , 1994 .

[34]  Robert J Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[35]  Zhiliang Ying,et al.  Semiparametric analysis of the additive risk model , 1994 .

[36]  Thomas H. Scheike,et al.  Coordinate Descent Methods for the Penalized Semiparametric Additive Hazards Model , 2012 .

[37]  Jennifer M. Polinski,et al.  Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases , 2014, Comput. Stat. Data Anal..

[38]  Tzu-Jung Huang,et al.  Marginal screening for high-dimensional predictors of survival outcomes. , 2019, Statistica Sinica.

[39]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[40]  Konstantinos Fokianos,et al.  An Updated Literature Review of Distance Correlation and Its Applications to Time Series , 2017, International Statistical Review.

[41]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[42]  Yang Feng,et al.  SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models , 2018 .

[43]  Hongtu Zhu,et al.  A Generic Sure Independence Screening Procedure , 2018, Journal of the American Statistical Association.

[44]  Hong Wang,et al.  Robust feature screening for ultra-high dimensional right censored data via distance correlation , 2018, Comput. Stat. Data Anal..

[45]  Yong Liang,et al.  Robust sparse accelerated failure time model for survival analysis , 2018, Technology and health care : official journal of the European Society for Engineering and Medicine.

[46]  Till Acker,et al.  DNA methylation-based classification of central nervous system tumours , 2018, Nature.

[47]  Thomas H. Scheike,et al.  Independent screening for single‐index hazard rate models with ultrahigh dimensional features , 2011, 1105.3361.

[48]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .