Integrated powered density: Screening ultrahigh dimensional covariates with survival outcomes

Modern biomedical studies have yielded abundant survival data with high-throughput predictors. Variable screening is a crucial first step in analyzing such data, for the purpose of identifying predictive biomarkers, understanding biological mechanisms, and making accurate predictions. To nonparametrically quantify the relevance of each candidate variable to the survival outcome, we propose integrated powered density (IPOD), which compares the differences in the covariate-stratified distribution functions. The proposed new class of statistics, with a flexible weighting scheme, is general and includes the Kolmogorov statistic as a special case. Moreover, the method does not rely on rigid regression model assumptions and can be easily implemented. We show that our method possesses sure screening properties, and confirm the utility of the proposal with extensive simulation studies. We apply the method to analyze a multiple myeloma study on detecting gene signatures for cancer patients' survival.

[1]  Q. Zhan,et al.  Antitumor activity of cytotropic heterogeneous molecular lipids (CHML) on human breast cancer xenograft in nude mice. , 2001, Anticancer research.

[2]  Hui Zou,et al.  The fused Kolmogorov filter: A nonparametric model-free screening method , 2014, 1403.7701.

[3]  Joseph Beyene,et al.  Determining relative importance of variables in developing and validating predictive models , 2009, BMC medical research methodology.

[4]  Xuesong Li,et al.  Expression of oncogenic HMGN5 increases the sensitivity of prostate cancer cells to gemcitabine. , 2015, Oncology reports.

[5]  Udaya B. Kogalur,et al.  Random Survival Forests for R , 2007 .

[6]  Shuangge Ma,et al.  Censored Rank Independence Screening for High-dimensional Survival Data. , 2014, Biometrika.

[7]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[8]  Jane-Ling Wang,et al.  Density and hazard rate estimation for censored data via strong representation of the Kaplan-Meier estimator , 1985 .

[9]  Yongsheng Huang,et al.  A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. , 2006, Blood.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  D. Choubey,et al.  Interferon-inducible IFI16 protein in human cancers and autoimmune diseases. , 2008, Frontiers in bioscience : a journal and virtual library.

[12]  Jeffrey S. Morris,et al.  Sure independence screening for ultrahigh dimensional feature space Discussion , 2008 .

[13]  Qi Zheng,et al.  Survival impact index and ultrahigh‐dimensional model‐free screening with survival outcomes , 2016, Biometrics.

[14]  Yi Li,et al.  Conditional screening for ultra-high dimensional covariates with survival outcomes , 2016, Lifetime data analysis.

[15]  M. Kris,et al.  Randomized Phase III Trial of Docetaxel Versus Vinorelbine or Ifosfamide in Patients With Advanced Non–Small-Cell Lung Cancer Previously Treated With Platinum-Containing Chemotherapy Regimens , 2000 .

[16]  D. Miller,et al.  Late effects of childhood cancer. , 1988, American journal of diseases of children.

[17]  Joseph V. Simone,et al.  Childhood cancer survivorship : improving care and quality of life , 2003 .

[18]  M. Weng,et al.  The high-mobility group nucleosome-binding domain 5 is highly expressed in breast cancer and promotes the proliferation and invasion of breast cancer cells , 2015, Tumor Biology.

[19]  Runze Li,et al.  Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates , 2014, Journal of the American Statistical Association.

[20]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[21]  Jun Zhang,et al.  Robust rank correlation based screening , 2010, 1012.4255.

[22]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[23]  Thomas H. Scheike,et al.  Independent screening for single‐index hazard rate models with ultrahigh dimensional features , 2011, 1105.3361.

[24]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[25]  S. Halabi,et al.  On model specification and selection of the Cox proportional hazards model , 2013, Statistics in medicine.

[26]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[27]  Yong Zhou,et al.  Efficient Quantile Regression Analysis With Missing Observations , 2015 .

[28]  Donglin Zeng,et al.  Maximum likelihood estimation in semiparametric regression models with censored data , 2007, Statistica Sinica.

[29]  N. Rouas-Freiss,et al.  Neoplastic B-cell growth is impaired by HLA-G/ILT2 interaction , 2012, Leukemia.

[30]  Dorota M. Dabrowska,et al.  Uniform Consistency of the Kernel Conditional Kaplan-Meier Estimate , 1989 .

[31]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[32]  Robert J Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[33]  Anthony Boral,et al.  Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. , 2006, Blood.

[34]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[35]  Xiaofeng Shao,et al.  Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening , 2014 .

[36]  Fang Fang,et al.  Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification , 2016 .

[37]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[38]  W. Blackstock,et al.  Plasma Membrane Proteomics Identifies Biomarkers Associated with MMSET Overexpression in T(4;14) Multiple Myeloma , 2013, Oncotarget.

[39]  Andreas Heinzel,et al.  From molecular signatures to predictive biomarkers: modeling disease pathophysiology and drug mechanism of action , 2014, Front. Cell Dev. Biol..

[40]  Jianfu Yang,et al.  Knockdown of HMGN5 suppresses the viability and invasion of human urothelial bladder cancer 5637 cells in vitro and in vivo , 2015, Medical Oncology.