Gradient lasso for Cox proportional hazards model

MOTIVATION There has been an increasing interest in expressing a survival phenotype (e.g. time to cancer recurrence or death) or its distribution in terms of a subset of the expression data of a subset of genes. Due to high dimensionality of gene expression data, however, there is a serious problem of collinearity in fitting a prediction model, e.g. Cox's proportional hazards model. To avoid the collinearity problem, several methods based on penalized Cox proportional hazards models have been proposed. However, those methods suffer from severe computational problems, such as slow or even failed convergence, because of high-dimensional matrix inversions required for model fitting. We propose to implement the penalized Cox regression with a lasso penalty via the gradient lasso algorithm that yields faster convergence to the global optimum than do other algorithms. Moreover the gradient lasso algorithm is guaranteed to converge to the optimum under mild regularity conditions. Hence, our gradient lasso algorithm can be a useful tool in developing a prediction model based on high-dimensional covariates including gene expression data. RESULTS Results from simulation studies showed that the prediction model by gradient lasso recovers the prognostic genes. Also results from diffuse large B-cell lymphoma datasets and Norway/Stanford breast cancer dataset indicate that our method is very competitive compared with popular existing methods by Park and Hastie and Goeman in its computational time, prediction and selectivity. AVAILABILITY R package glcoxph is available at http://datamining.dongguk.ac.kr/R/glcoxph.

[1]  M. Akritas Nearest Neighbor Estimation of a Bivariate Distribution Under Random Censoring , 1994 .

[2]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[3]  D. Botstein,et al.  Exploring the new world of the genome with DNA microarrays , 1999, Nature Genetics.

[4]  T. Lumley,et al.  Time‐Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker , 2000, Biometrics.

[5]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[6]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[7]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Yongdai Kim,et al.  Gradient LASSO for feature selection , 2004, ICML.

[9]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[10]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[11]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[12]  Jian Huang,et al.  LASSO Method for Additive Risk Models with High Dimensional Covariates , 2005 .

[13]  Jiang Gui,et al.  Threshold Gradient Descent Method for Censored Data Regression with Applications in Pharmacogenomics , 2004, Pacific Symposium on Biocomputing.

[14]  Jürgen Wolf,et al.  CASPAR: a hierarchical Bayesian approach to predict survival times in cancer from gene expression data , 2006, Bioinform..

[15]  M. Segal Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited. , 2006, Biostatistics.

[16]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[17]  Hongzhe Li Censored Data Regression in High-Dimension and Low-Sample Size Settings For Genomic Applications , 2006 .

[18]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[19]  P. Sen,et al.  A Copula Approach for Detecting Prognostic Genes Associated With Survival Outcome in Microarray Studies , 2007, Biometrics.

[20]  CASPAR: a hierarchical Bayesian approach to predict survival times in cancer from gene expression data , 2007, Bioinform..

[21]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[22]  Kim,et al.  A gradient-based optimization algorithm for LASSO , 2008 .

[23]  H. Zou A note on path-based variable selection in the penalized proportional hazards model , 2008 .

[24]  Song Liu Variable selection in semi-parametric additive models with extensions to high dimensional data and additive Cox models , 2008 .

[25]  Yang Jing L1 Regularization Path Algorithm for Generalized Linear Models , 2008 .

[26]  D.,et al.  Regression Models and Life-Tables , 2022 .