On the prognostic value of survival models with application to gene expression signatures

As part of the validation of any statistical model, it is a good statistical practice to quantify the prediction accuracy and the amount of prognostic information represented by the model; this includes gene expression signatures derived from high-dimensional microarray data. Several approaches exist for right-censored survival data measuring the gain in prognostic information compared with established clinical parameters or biomarkers in terms of explained variation or explained randomness. They are either model-based or use estimates of prediction accuracy.As these measures differ in their underlying mechanisms, they vary in their interpretation, assumptions and properties, in particular in how they deal with the presence of censoring. It remains unclear, under what conditions and to what extent they are comparable. We present a comparison of several common measures and illustrate their behaviour in high-dimensional situations in simulation examples as well as in applications to real gene expression microarray data sets. An overview of available software implementations in R is given.

[1]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[2]  Harald Binder,et al.  Assessment of survival prediction models based on microarray data , 2007, Bioinform..

[3]  Stefan Michiels,et al.  Gene expression profiling: does it add predictive accuracy to clinical characteristics in cancer prognosis? , 2007, European journal of cancer.

[4]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[5]  Ronghui Xu,et al.  A. R.2 type measure of dependence for proportional hazards models , 1999 .

[6]  M. Schemper,et al.  Predictive Accuracy and Explained Variation in Cox Regression , 2000, Biometrics.

[7]  M. Segal Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited. , 2006, Biostatistics.

[8]  Ying Huang,et al.  Evaluating the ROC performance of markers for future events , 2008, Lifetime data analysis.

[9]  Schumacher Martin,et al.  Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples , 2008 .

[10]  E Graf,et al.  Assessment and comparison of prognostic classification schemes for survival data. , 1999, Statistics in medicine.

[11]  N. Cox,et al.  A Note on the Concordance Correlation Coefficient , 2002 .

[12]  L. Magee,et al.  R 2 Measures Based on Wald and Likelihood Ratio Joint Significance Tests , 1990 .

[13]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[14]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[15]  P. J. Verweij,et al.  Penalized likelihood in Cox regression. , 1994, Statistics in medicine.

[16]  R Simon,et al.  Measures of explained variation for survival data. , 1990, Statistics in medicine.

[17]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[18]  Patrick Royston,et al.  Explained Variation for Survival Models , 2006 .

[19]  John T. Kent,et al.  Measures of dependence for censored survival data , 1988 .

[20]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[21]  John O'Quigley,et al.  Explained randomness in proportional hazards models , 2005, Statistics in medicine.

[22]  J. Goeman L1 Penalized Estimation in the Cox Proportional Hazards Model , 2009, Biometrical journal. Biometrische Zeitschrift.

[23]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[24]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[25]  P. Heagerty,et al.  Survival Model Predictive Accuracy and ROC Curves , 2005, Biometrics.

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  M. Schumacher,et al.  Consistent Estimation of the Expected Brier Score in General Survival Models with Right‐Censored Event Times , 2006, Biometrical journal. Biometrische Zeitschrift.

[28]  Patrick Royston,et al.  A new measure of prognostic separation in survival data , 2004, Statistics in medicine.

[29]  David A. Schoenfeld,et al.  Partial residuals for the proportional hazards regression model , 1982 .

[30]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[31]  P. J. Verweij,et al.  Cross-validation in survival analysis. , 1993, Statistics in medicine.