Testing Calibration of Cox Survival Models at Extremes of Event Risk

Risk prediction models can translate genetic association findings for clinical decision-making. Most models are evaluated on their ability to discriminate, and the calibration of risk-prediction models is largely overlooked in applications. Models that demonstrate good discrimination in training datasets, if not properly calibrated to produce unbiased estimates of risk, can perform poorly in new patient populations. Poorly calibrated models arise due to missing covariates, such as genetic interactions that may be unknown or not measured. We demonstrate that models omitting interactions can lead to increased bias in predicted risk for patients at the tails of the risk distribution; i.e., those patients who are most likely to be affected by clinical decision making. We propose a new calibration test for Cox risk-prediction models that aggregates martingale residuals for subjects from extreme high and low risk groups with a test statistic maximum chosen by varying which risk groups are included in the extremes. To estimate the empirical significance of our test statistic, we simulate from a Gaussian distribution using the covariance matrix for the grouped sums of martingale residuals. Simulation shows the new test maintains control of type 1 error with improved power over a conventional goodness-of-fit test when risk prediction deviates at the tails of the risk distribution. We apply our method in the development of a prediction model for risk of cystic fibrosis-related diabetes. Our study highlights the importance of assessing calibration and discrimination in predictive modeling, and provides a complementary tool in the assessment of risk model calibration.

[1]  Z. Ying,et al.  Checking the Cox model with cumulative sums of martingale-based residuals , 1993 .

[2]  Theodore Chiang,et al.  Unraveling the complex genetic model for cystic fibrosis: pleiotropic effects of modifier genes on early cystic fibrosis-related morbidities , 2013, Human Genetics.

[3]  G. Collins,et al.  External validation of multivariable prediction models: a systematic review of methodological conduct and reporting , 2014, BMC Medical Research Methodology.

[4]  Michael R Knowles,et al.  Multiple apical plasma membrane constituents are associated with susceptibility to meconium ileus in individuals with cystic fibrosis , 2012, Nature Genetics.

[5]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[6]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data: Kalbfleisch/The Statistical , 2002 .

[7]  D. Hosmer,et al.  A Simplified Method of Calculating an Overall Goodness-of-Fit Test for the Cox Proportional Hazards Model , 1998, Lifetime data analysis.

[8]  C R Weinberg,et al.  Applicability of the simple independent action model to epidemiologic studies involving two factors and a dichotomous outcome. , 1986, American journal of epidemiology.

[9]  A. Paterson,et al.  A Joint Location-Scale Test Improves Power to Detect Associated SNPs, Gene Sets, and Pathways , 2015, American journal of human genetics.

[10]  A. Moran,et al.  Cystic Fibrosis–Related Diabetes: Current Trends in Prevalence, Incidence, and Mortality , 2009, Diabetes Care.

[11]  Ørnulf Borgan,et al.  A method for checking regression models in survival analysis based on the risk score , 1996, Lifetime data analysis.

[12]  J. Rommens,et al.  Genetic Modifiers of Cystic Fibrosis–Related Diabetes , 2013, Diabetes.

[13]  S R Lipsitz,et al.  A Global Goodness‐of‐Fit Statistic for Cox Regression Models , 1999, Biometrics.

[14]  D. Hosmer,et al.  A Cautionary Note on the Use of the Grønnesby and Borgan Goodness-of-Fit Test for the Cox Proportional Hazards Model , 2004, Lifetime data analysis.

[15]  Melissa R. Miller,et al.  Evidence for a Causal Relationship Between Early Exocrine Pancreatic Disease and Cystic Fibrosis–Related Diabetes: A Mendelian Randomization Study , 2014, Diabetes.

[16]  W. Barlow,et al.  Residuals for relative risk regression , 1988 .

[17]  M. Woodward,et al.  Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker , 2012, Heart.

[18]  P. Grambsch,et al.  Martingale-based residuals for survival models , 1990 .

[19]  Olga V. Demler,et al.  Tests of calibration and goodness‐of‐fit in the survival setting , 2015, Statistics in medicine.

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[22]  Ralph B. D'Agostino,et al.  Evaluation of the Performance of Survival Analysis Models: Discrimination and Calibration Measures , 2003, Advances in Survival Analysis.

[23]  T. Lumley,et al.  Time‐Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker , 2000, Biometrics.

[24]  Peter Kraft,et al.  Testing calibration of risk models at extremes of disease risk. , 2015, Biostatistics.

[25]  T. Therneau,et al.  Assessing calibration of prognostic risk scores , 2016, Statistical methods in medical research.

[26]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[27]  David A. Schoenfeld,et al.  Partial residuals for the proportional hazards regression model , 1982 .