Tests of calibration and goodness‐of‐fit in the survival setting

To access the calibration of a predictive model in a survival analysis setting, several authors have extended the Hosmer-Lemeshow goodness-of-fit test to survival data. Grønnesby and Borgan developed a test under the proportional hazards assumption, and Nam and D'Agostino developed a nonparametric test that is applicable in a more general survival setting for data with limited censoring. We analyze the performance of the two tests and show that the Grønnesby-Borgan test attains appropriate size in a variety of settings, whereas the Nam-D'Agostino method has a higher than nominal Type 1 error when there is more than trivial censoring. Both tests are sensitive to small cell sizes. We develop a modification of the Nam-D'Agostino test to allow for higher censoring rates. We show that this modified Nam-D'Agostino test has appropriate control of Type 1 error and comparable power to the Grønnesby-Borgan test and is applicable to settings other than proportional hazards. We also discuss the application to small cell sizes.

[1]  N. Cook,et al.  Development and validation of improved algorithms for the assessment of global cardiovascular risk in women: the Reynolds Risk Score. , 2007, JAMA.

[2]  Y Vergouwe,et al.  Updating methods improved the performance of a clinical prediction model in new patients. , 2008, Journal of clinical epidemiology.

[3]  David W. Hosmer,et al.  Applied Survival Analysis: Regression Modeling of Time-to-Event Data , 2008 .

[4]  Nancy R Cook,et al.  Performance of reclassification statistics in comparing risk prediction models , 2011, Biometrical journal. Biometrische Zeitschrift.

[5]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[6]  M. Pencina,et al.  General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study , 2008, Circulation.

[7]  Holly Janes,et al.  Methods for Evaluating Prediction Performance of Biomarkers and Tests , 2013 .

[8]  D. Hosmer,et al.  A Cautionary Note on the Use of the Grønnesby and Borgan Goodness-of-Fit Test for the Cox Proportional Hazards Model , 2004, Lifetime data analysis.

[9]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[10]  K. Anderson,et al.  Cardiovascular disease risk profiles. , 1991, American heart journal.

[11]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[12]  Ørnulf Borgan,et al.  A method for checking regression models in survival analysis based on the risk score , 1996, Lifetime data analysis.

[13]  N. Unwin,et al.  Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) Detection, Evaluation, and Treatment of High Blood Cholesterol Education Program (NCEP) Expert Panel on Executive Summary of the Third Report of the National , 2009 .

[14]  Nancy R Cook,et al.  Advances in Measuring the Effect of Individual Predictors of Cardiovascular Risk: The Role of Reclassification Measures , 2009, Annals of Internal Medicine.

[15]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[16]  D. Hosmer,et al.  A Simplified Method of Calculating an Overall Goodness-of-Fit Test for the Cox Proportional Hazards Model , 1998, Lifetime data analysis.

[17]  Ralph B. D'Agostino,et al.  Evaluation of the Performance of Survival Analysis Models: Discrimination and Calibration Measures , 2003, Advances in Survival Analysis.

[18]  S R Lipsitz,et al.  A Global Goodness‐of‐Fit Statistic for Cox Regression Models , 1999, Biometrics.

[19]  T. Therneau,et al.  Assessing calibration of prognostic risk scores , 2016, Statistical methods in medical research.

[20]  J. Mckenney,et al.  Executive Summary of The Third Report of The National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, And Treatment of High Blood Cholesterol In Adults (Adult Treatment Panel III). , 2001, JAMA.

[21]  M. Gail,et al.  Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. , 1989, Journal of the National Cancer Institute.

[22]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[23]  D. Guffey Hosmer-Lemeshow goodness-of-fit test: Translations to the Cox Proportional Hazards Model , 2012 .

[24]  Landon H. Sego,et al.  Risk‐adjusted monitoring of survival times , 2009, Statistics in medicine.

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[27]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data: Kalbfleisch/The Statistical , 2002 .

[28]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .