Estimation of prediction error for survival models

When statistical models are used to predict the values of unobserved random variables, loss functions are often used to quantify the accuracy of a prediction. The expected loss over some specified set of occasions is called the prediction error. This paper considers the estimation of prediction error when regression models are used to predict survival times and discusses the use of these estimates. Extending the previous work, we consider both point and confidence interval estimations of prediction error, and allow for variable selection and model misspecification. Different estimators are compared in a simulation study for an absolute relative error loss function, and results indicate that cross-validation procedures typically produce reliable point estimates and confidence intervals, whereas model-based estimates are sensitive to model misspecification. Links between performance measures for point predictors and for predictive distributions of survival times are also discussed. The methodology is illustrated in a medical setting involving survival after treatment for disease.

[1]  J. Kalbfleisch,et al.  Calculating Life Years from Transplant (LYFT): Methods for Kidney and Kidney‐Pancreas Candidates , 2008, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[2]  P. Heagerty,et al.  Survival Model Predictive Accuracy and ROC Curves , 2005, Biometrics.

[3]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[4]  R Henderson,et al.  Problems and prediction in survival-data analysis. , 1995, Statistics in medicine.

[5]  Reducing bias in parameter estimates from stepwise regression in proportional hazards regression with right-censored data , 2008, Lifetime data analysis.

[6]  James M Robins,et al.  Locally Efficient Estimation of a Multivariate Survival Function in Longitudinal Studies , 2002 .

[7]  J. Harley,et al.  A step-up procedure for selecting variables associated with survival. , 1975, Biometrics.

[8]  Gordon Johnston,et al.  Statistical Models and Methods for Lifetime Data , 2003, Technometrics.

[9]  Ying Huang,et al.  Evaluating the ROC performance of markers for future events , 2008, Lifetime data analysis.

[10]  R. Tibshirani,et al.  The Covariance Inflation Criterion for Adaptive Model Selection , 1999 .

[11]  Yan Yuan,et al.  Prediction Performance of Survival Models , 2008 .

[12]  B. Efron The Estimation of Prediction Error , 2004 .

[13]  M. Schemper,et al.  Predictive Accuracy and Explained Variation in Cox Regression , 2000, Biometrics.

[14]  M Cortina Borja,et al.  Modelling Survival Data: extending the Cox model, by T.M. Therneau and P.M. Grambsch , 2002 .

[15]  Thomas A Gerds,et al.  Efron‐Type Measures of Prediction Error for Survival Analysis , 2007, Biometrics.

[16]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[17]  R Simon,et al.  Measures of explained variation for survival data. , 1990, Statistics in medicine.

[18]  M Jones,et al.  Accuracy of point predictions in survival analysis , 2001, Statistics in medicine.

[19]  E Graf,et al.  Assessment and comparison of prognostic classification schemes for survival data. , 1999, Statistics in medicine.

[20]  N Keiding,et al.  Individual survival time prediction using statistical models , 2005, Journal of Medical Ethics.

[21]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[22]  J. Lawless,et al.  Efficient Screening of Nonnormal Regression Models , 1978 .

[23]  Niels Keiding,et al.  Explained Variation and Predictive Accuracy in General Parametric Statistical Models: The Role of Model Misspecification , 2004, Lifetime data analysis.

[24]  E Graf,et al.  Quantifying the Predictive Performance of Prognostic Models for Censored Survival Data with Time‐Dependent Covariates , 2008, Biometrics.

[25]  Els Goetghebeur,et al.  Model evaluation based on the sampling distribution of estimated absolute prediction error , 2007 .

[26]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[27]  D. Harrington,et al.  Counting Processes and Survival Analysis , 1991 .

[28]  J. Robins,et al.  Recovery of Information and Adjustment for Dependent Censoring Using Surrogate Markers , 1992 .

[29]  A. Raftery,et al.  Probabilistic forecasts, calibration and sharpness , 2007 .

[30]  Richard Simon,et al.  A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification , 2007, Statistics in medicine.

[31]  J. C. van Houwelingen,et al.  Predictive value of statistical models , 1990 .

[32]  Richard Simon,et al.  Explained Residual Variation, Explained Risk, and Goodness of Fit , 1991 .

[33]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[34]  M. Pencina,et al.  Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation , 2004, Statistics in medicine.

[35]  Tianxi Cai,et al.  Evaluating Prediction Rules for t-Year Survivors With Censored Regression Models , 2007 .

[36]  M. Schumacher,et al.  Consistent Estimation of the Expected Brier Score in General Survival Models with Right‐Censored Event Times , 2006, Biometrical journal. Biometrische Zeitschrift.

[37]  P. Bühlmann,et al.  Survival ensembles. , 2006, Biostatistics.