Predicting survival from microarray data—a comparative study

Motivation: Survival prediction from gene expression data and other high-dimensional genomic data has been subject to much research during the last years. These kinds of data are associated with the methodological problem of having many more gene expression values than individuals. In addition, the responses are censored survival times. Most of the proposed methods handle this by using Cox's proportional hazards model and obtain parameter estimates by some dimension reduction or parameter shrinkage estimation technique. Using three well-known microarray gene expression data sets, we compare the prediction performance of seven such methods: univariate selection, forward stepwise selection, principal components regression (PCR), supervised principal components regression, partial least squares regression (PLS), ridge regression and the lasso. Results: Statistical learning from subsets should be repeated several times in order to get a fair comparison between methods. Methods using coefficient shrinkage or linear combinations of the gene expression values have much better performance than the simple variable selection methods. For our data sets, ridge regression has the overall best performance. Availability: Matlab and R code for the prediction methods are available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/. Contact: hegembo@math.uio.no

[1]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[2]  L. V. van't Veer,et al.  Cross‐validated Cox regression on microarray gene expression data , 2006, Statistics in medicine.

[3]  M. Segal Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited. , 2006, Biostatistics.

[4]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[5]  Howard Y. Chang,et al.  Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[7]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[8]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[10]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[11]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[12]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[13]  Magne Aldrin,et al.  Length modified ridge regression , 1997 .

[14]  J. Klein,et al.  Survival Analysis: Techniques for Censored and Truncated Data , 1997 .

[15]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[16]  B. Marx Iteratively reweighted partial least squares estimation for generalized linear regression , 1996 .

[17]  P. J. Verweij,et al.  Penalized likelihood in Cox regression. , 1994, Statistics in medicine.

[18]  P. J. Verweij,et al.  Cross-validation in survival analysis. , 1993, Statistics in medicine.

[19]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[20]  H. L. Størvold,et al.  Partial least squares Cox regression on genomic data handling additional covariates , 2006 .

[21]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Rupert G. Miller,et al.  Survival Analysis , 2022, The SAGE Encyclopedia of Research Design.

[24]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .