An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis

Motivated by high-dimensional genomic studies, we develop an improved procedure for adaptive Lasso in high-dimensional survival analysis. The proposed procedure effectively reduces the false discoveries while successfully maintaining the false negative proportions, which improves the existing adaptive Lasso procedures. The implementation of the proposed procedure is straightforward and it is sufficiently flexible to accommodate large-scale problems where traditional procedures are impractical. To quantify the uncertainty of variable selection and control the family-wise error rate, a multiple sample-splitting based testing algorithm is developed. The practical utility of the proposed procedure are examined through simulation studies. The methods developed are then applied to a multiple myeloma data set.

[1]  Trevor J Pugh,et al.  Initial genome sequencing and analysis of multiple myeloma , 2011, Nature.

[2]  Yi Li,et al.  Score test variable screening , 2014, Biometrics.

[3]  Peter Bühlmann,et al.  p-Values for High-Dimensional Regression , 2008, 0811.2177.

[4]  Xiang Zhou,et al.  Differential expression analysis for RNAseq using Poisson mixed models , 2016, bioRxiv.

[5]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[6]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[7]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[8]  Eric di Luccio,et al.  Inhibition of Nuclear Receptor Binding SET Domain 2/Multiple Myeloma SET Domain by LEM-06 Implication for Epigenetic Cancer Therapies. , 2015 .

[9]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[10]  P. Heagerty,et al.  Survival Model Predictive Accuracy and ROC Curves , 2005, Biometrics.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[13]  Xin Duan,et al.  Interferon-Inducible IFI16, a Negative Regulator of Cell Growth, Down-Regulates Expression of Human Telomerase Reverse Transcriptase (hTERT) Gene , 2010, PloS one.

[14]  Kevin C. Dorff,et al.  The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models , 2010, Nature Biotechnology.

[15]  M. Pencina,et al.  On the C‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data , 2011, Statistics in medicine.

[16]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[19]  Yongsheng Huang,et al.  A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. , 2006, Blood.

[20]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[21]  R. Bataille,et al.  Beta-2-microglobulin in myeloma: optimal use for staging, prognosis, and treatment--a prospective study of 160 patients. , 1984, Blood.

[22]  Kenneth Lange,et al.  Stability selection for genome‐wide association , 2011, Genetic epidemiology.

[23]  S. Geer,et al.  Adaptive Lasso for High Dimensional Regression and Gaussian Graphical Modeling , 2009, 0903.2515.

[24]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[25]  J. Goeman L1 Penalized Estimation in the Cox Proportional Hazards Model , 2009, Biometrical journal. Biometrische Zeitschrift.

[26]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.