Doubly Penalized Buckley–James Method for Survival Data with High‐Dimensional Covariates

Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L1- and L2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study.

[1]  D. Cox Regression Models and Life-Tables , 1972 .

[2]  Hongzhe Li,et al.  Kernel Cox Regression Models for Linking Gene Expression Profiles to Censored Survival Data , 2002, Pacific Symposium on Biocomputing.

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  A. Tsiatis Estimating Regression Parameters Using Linear Rank Tests for Censored Data , 1990 .

[6]  Ya'acov Ritov,et al.  Estimation in a Linear Regression Model with Censored Data , 1990 .

[7]  David Harrington,et al.  Iterative Partial Least Squares with Right‐Censored Data Analysis: A Comparison to Other Dimension Reduction Techniques , 2005, Biometrics.

[8]  Zhiliang Ying,et al.  Linear regression analysis of censored survival data based on rank tests , 1990 .

[9]  L. Weissfeld,et al.  Estimation in linear models with censored data , 1986 .

[10]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[11]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data: Kalbfleisch/The Statistical , 2002 .

[13]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[14]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[15]  J. V. Ryzin,et al.  Regression Analysis with Randomly Right-Censored Data , 1981 .

[16]  J. V. Ryzin,et al.  LARGE SAMPLE THEORY FOR AN ESTIMATOR OF THE MEAN SURVIVAL TIME FROM CENSORED SAMPLES , 1980 .

[17]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[18]  Shuangge Ma,et al.  Additive Risk Models for Survival Data with High‐Dimensional Covariates , 2006, Biometrics.

[19]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[20]  Zhiliang Ying,et al.  A Large Sample Study of Rank Estimation for Censored Regression Data , 1993 .

[21]  Thomas S. Ferguson,et al.  Large Sample Theory , 1995 .

[22]  Yi Zhang,et al.  Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung. , 2006, Cancer research.

[23]  Jeffrey S. Simonoff,et al.  A comparison of estimators for regression with a censored response variable , 1990 .

[24]  D. Warburton,et al.  Levels of mesenchymal FGFR2 signaling modulate smooth muscle progenitor cell commitment in the lung. , 2006, Developmental biology.

[25]  Zhiliang Ying,et al.  Large Sample Theory of a Modified Buckley-James Estimator for Regression Analysis with Censored Data , 1991 .

[26]  F. O’Sullivan Nonparametric Estimation of Relative Risk Using Splines and Cross-Validation , 1988 .

[27]  Michael R. Green,et al.  Gene Expression , 1993, Progress in Gene Expression.

[28]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[29]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  David E. Misek,et al.  Proteomic analysis of cytokeratin isoforms uncovers association with survival in lung adenocarcinoma. , 2002, Neoplasia.

[31]  Bin Nan,et al.  A Varying‐Coefficient Cox Model for the Effect of Age at a Marker Event on Age at Menopause , 2005, Biometrics.

[32]  D. Botstein,et al.  Diversity of gene expression in adenocarcinoma of the lung , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Rupert G. Miller Least squares regression with censored data , 1976 .

[34]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[35]  Daniel T. Larose Dimension Reduction Methods , 2006 .

[36]  Lee-Jen Wei,et al.  The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. , 1992, Statistics in medicine.

[37]  Hongzhe Li,et al.  Dimension reduction methods for microarrays with application to censored survival data , 2004, Bioinform..

[38]  D. Harrington,et al.  Penalized Partial Likelihood Regression for Right‐Censored Data with Bootstrap Selection of the Penalty Parameter , 2002, Biometrics.

[39]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[40]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[41]  Ian James Accelerated Failure‐time Models , 2005 .

[42]  Torsten Bohlin,et al.  Large-sample theory , 1991 .

[43]  I. James,et al.  Linear regression with censored data , 1979 .

[44]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[45]  B. Nan,et al.  A hybrid Newton-type method for censored survival data using double weights in linear models , 2006, Lifetime data analysis.

[46]  Jian Huang,et al.  Regularized Estimation in the Accelerated Failure Time Model with High‐Dimensional Covariates , 2006, Biometrics.

[47]  K. Fang,et al.  Number-theoretic methods in statistics , 1993 .

[48]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[49]  Jiang Gui,et al.  Partial Cox regression analysis for high-dimensional microarray gene expression data , 2004, ISMB/ECCB.

[50]  J. V. Ryzin,et al.  A Buckley-James-type estimator for the mean with censored data , 1984 .

[51]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[52]  David E. Misek,et al.  Genomic and proteomic analyses of vascular endothelial growth factor and insulin-like growth factor-binding protein 3 in lung adenocarcinomas. , 2004, Clinical lung cancer.