Tuning Parameter Selection in Cox Proportional Hazards Model with a Diverging Number of Parameters

Regularized variable selection is a powerful tool for identifying the true regression model from a large number of candidates by applying penalties to the objective functions. The penalty functions typically involve a tuning parameter that control the complexity of the selected model. The ability of the regularized variable selection methods to identify the true model critically depends on the correct choice of the tuning parameter. In this study we develop a consistent tuning parameter selection method for regularized Cox's proportional hazards model with a diverging number of parameters. The tuning parameter is selected by minimizing the generalized information criterion. We prove that, for any penalty that possesses the oracle property, the proposed tuning parameter selection method identifies the true model with probability approaching one as sample size increases. Its finite sample performance is evaluated by simulations. Its practical use is demonstrated in the Cancer Genome Atlas (TCGA) breast cancer data.

[1]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[2]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[3]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[4]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[5]  H. Akaike Maximum likelihood identification of Gaussian autoregressive moving average models , 1973 .

[6]  Runze Li,et al.  Variable selection for multivariate failure time data. , 2005, Biometrika.

[7]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[8]  J. Rey,et al.  Prognostic and Predictive Significance of MYC and KRAS Alterations in Breast Cancer from Women Treated with Neoadjuvant Chemotherapy , 2013, PloS one.

[9]  Ki-Chun Yoo,et al.  Activation of KRAS promotes the mesenchymal features of basal-type breast cancer , 2015, Experimental & Molecular Medicine.

[10]  Chenlei Leng,et al.  Shrinkage tuning parameter selection with a diverging number of parameters , 2008 .

[11]  A. Raftery,et al.  Bayesian Information Criterion for Censored Survival Models , 2000, Biometrics.

[12]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[13]  Runze Li,et al.  Regularization Parameter Selections via Generalized Information Criterion , 2010, Journal of the American Statistical Association.

[14]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[15]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[16]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[17]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[18]  Steven J. M. Jones,et al.  Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer , 2015, Cell.

[19]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[20]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[21]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[22]  Jianqing Fan,et al.  REGULARIZATION FOR COX'S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY. , 2010, Annals of statistics.

[23]  Tao Wang,et al.  Consistent tuning parameter selection in high dimensional sparse linear regression , 2011, J. Multivar. Anal..

[24]  S. Kong,et al.  Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso. , 2012, Statistica Sinica.

[25]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[26]  Ying Zhang,et al.  Sparse estimation of Cox proportional hazards models via approximated information criteria , 2016, Biometrics.

[27]  R. Scott,et al.  Hereditary nonpolyposis colorectal cancer in 95 families: differences and similarities between mutation-positive and mutation-negative kindreds. , 2001, American journal of human genetics.

[28]  K. Kalland,et al.  KRAS gene amplification and overexpression but not mutation associates with aggressive and metastatic endometrial cancer , 2012, British Journal of Cancer.

[29]  Jinfeng Xu,et al.  Extended Bayesian information criterion in the Cox model with a high-dimensional feature space , 2014, Annals of the Institute of Statistical Mathematics.

[30]  Annie Qu,et al.  MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS , 2013 .

[31]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[32]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.