A clinician’s guide for developing a prediction model: a case study using real-world data of patients with castration-resistant prostate cancer

Purpose With the increasing interest in treatment decision-making based on risk prediction models, it is essential for clinicians to understand the steps in developing and interpreting such models. Methods A retrospective registry of 20 Dutch hospitals with data on patients treated for castration-resistant prostate cancer was used to guide clinicians through the steps of developing a prediction model. The model of choice was the Cox proportional hazard model. Results Using the exemplary dataset several essential steps in prediction modelling are discussed including: coding of predictors, missing values, interaction, model specification and performance. An advanced method for appropriate selection of main effects, e.g. Least Absolute Shrinkage and Selection Operator (LASSO) regression, is described. Furthermore, the assumptions of Cox proportional hazard model are discussed, and how to handle violations of the proportional hazard assumption using time-varying coefficients. Conclusion This study provides a comprehensive detailed guide to bridge the gap between the statistician and clinician, based on a large dataset of real-world patients treated for castration-resistant prostate cancer.

[1]  James E. Helmreich Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition) , 2016 .

[2]  Michael J Pencina,et al.  Evaluating Discrimination of Risk Prediction Models: The C Statistic. , 2015, JAMA.

[3]  G. Guyatt,et al.  Discrimination and Calibration of Clinical Prediction Models: Users’ Guides to the Medical Literature , 2017, JAMA.

[4]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[5]  Jian Huang,et al.  Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors , 2012, Statistics and Computing.

[6]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[7]  L. Tick,et al.  Differences in Trial and Real-world Populations in the Dutch Castration-resistant Prostate Cancer Registry. , 2016, European urology focus.

[8]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[9]  L. Fisher,et al.  Time-dependent covariates in the Cox proportional-hazards regression model. , 1999, Annual review of public health.

[10]  Stanley R. Johnson,et al.  Varying Coefficient Models , 1984 .

[11]  Yvonne Vergouwe,et al.  A calibration hierarchy for risk models was defined: from utopia to empirical data. , 2016, Journal of clinical epidemiology.

[12]  A. Atkinson Subset Selection in Regression , 1992 .

[13]  Thomas Jaki,et al.  A review of statistical updating methods for clinical prediction models , 2018, Statistical methods in medical research.

[14]  D. Lin,et al.  Prediction models for prostate cancer outcomes: what is the state of the art in 2017? , 2017, Current opinion in urology.

[15]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  H. G. van der Poel,et al.  EAU-ESTRO-SIOG Guidelines on Prostate Cancer. Part II: Treatment of Relapsing, Metastatic, and Castration-Resistant Prostate Cancer. , 2017, European urology.

[18]  David A. Schoenfeld,et al.  Partial residuals for the proportional hazards regression model , 1982 .

[19]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[20]  P. Grambsch,et al.  Proportional hazards tests and diagnostics based on weighted residuals , 1994 .

[21]  Douglas G Altman,et al.  Dichotomizing continuous predictors in multiple regression: a bad idea , 2006, Statistics in medicine.

[22]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[23]  C. Bangma,et al.  Should we involve patients more actively? Perspectives of the multidisciplinary team on shared decision-making for older patients with metastatic castration-resistant prostate cancer. , 2019, Journal of geriatric oncology.

[24]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[25]  Bruce Ratner,et al.  Variable selection methods in regression: Ignorable problem, outing notable solution , 2010 .

[26]  Grigorios Papageorgiou,et al.  Statistical primer: how to deal with missing data in scientific research? , 2018, Interactive cardiovascular and thoracic surgery.

[27]  K. Carroll,et al.  On the use and utility of the Weibull model in the analysis of survival data. , 2003, Controlled clinical trials.

[28]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[29]  Michael A Babyak,et al.  What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models , 2004, Psychosomatic medicine.

[30]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .