Sample size considerations for the external validation of a multivariable prognostic model: a resampling study

After developing a prognostic model, it is essential to evaluate the performance of the model in samples independent from those used to develop the model, which is often referred to as external validation. However, despite its importance, very little is known about the sample size requirements for conducting an external validation. Using a large real data set and resampling methods, we investigate the impact of sample size on the performance of six published prognostic models. Focussing on unbiased and precise estimation of performance measures (e.g. the c‐index, D statistic and calibration), we provide guidance on sample size for investigators designing an external validation study. Our study suggests that externally validating a prognostic model requires a minimum of 100 events and ideally 200 (or more) events. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

[1]  Yvonne Vergouwe,et al.  External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients. , 2010, American journal of epidemiology.

[2]  G. Collins,et al.  Identifying patients with undetected gastro-oesophageal cancer in primary care: External validation of QCancer® (Gastro-Oesophageal). , 2013, European journal of cancer.

[3]  G. S. Collins,et al.  External validation of QDSCORE® for predicting the 10‐year risk of developing Type 2 diabetes , 2011, Diabetic medicine : a journal of the British Diabetic Association.

[4]  S. Blot,et al.  Evaluation of mortality following severe burns injury in Hungary: external validation of a prediction model developed on Belgian burn data. , 2009, Burns : journal of the International Society for Burn Injuries.

[5]  C. J. Stone,et al.  Hazard Regression , 2022 .

[6]  John O'Quigley,et al.  Explained randomness in proportional hazards models , 2005, Statistics in medicine.

[7]  Panagiotis Michail,et al.  External validation of the modified Thoracoscore in a new thoracic surgery program: prediction of in-hospital mortality. , 2009, Interactive cardiovascular and thoracic surgery.

[8]  Patrick Royston,et al.  A simulation study of predictive ability measures in a survival model I: Explained variation measures , 2012, Statistics in medicine.

[9]  Jürgen Unützer,et al.  A comparison of imputation methods in a longitudinal randomized clinical trial , 2005, Statistics in medicine.

[10]  P. Royston,et al.  External validation of a Cox prognostic model: principles and methods , 2013, BMC Medical Research Methodology.

[11]  Pierre I Karakiewicz,et al.  An updated catalog of prostate cancer predictive tools , 2008, Cancer.

[12]  Patrick Royston,et al.  Explained Variation for Survival Models , 2006 .

[13]  Joseph R. Rausch,et al.  Sample size planning for statistical power and accuracy in parameter estimation. , 2008, Annual review of psychology.

[14]  N F de Keizer,et al.  External validation of prognostic models for critically ill patients required substantial sample sizes. , 2007, Journal of clinical epidemiology.

[15]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[16]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[17]  A. Sheikh,et al.  Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2 , 2008, BMJ : British Medical Journal.

[18]  M. Schumacher,et al.  Consistent Estimation of the Expected Brier Score in General Survival Models with Right‐Censored Event Times , 2006, Biometrical journal. Biometrische Zeitschrift.

[19]  Patrick Royston,et al.  A new measure of prognostic separation in survival data , 2004, Statistics in medicine.

[20]  H C van Houwelingen,et al.  Validation, calibration, revision and combination of prognostic survival models. , 2000, Statistics in medicine.

[21]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[22]  E Graf,et al.  Assessment and comparison of prognostic classification schemes for survival data. , 1999, Statistics in medicine.

[23]  K. Covinsky,et al.  Assessing the Generalizability of Prognostic Information , 1999, Annals of Internal Medicine.

[24]  Gary S Collins,et al.  Predicting the adverse risk of statin treatment: an independent and external validation of Qstatin risk scores in the UK , 2012, Heart.

[25]  G. S. Collins PhD Senior Medical Statistician Identifying women with undetected ovarian cancer: independent and external validation of QCancer® (Ovarian) prediction model , 2012 .

[26]  G. Collins,et al.  External validation of multivariable prediction models: a systematic review of methodological conduct and reporting , 2014, BMC Medical Research Methodology.

[27]  G. Collins,et al.  Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting , 2011, BMC medicine.

[28]  P Royston,et al.  A simulation study of predictive ability measures in a survival model II: explained randomness and predictive accuracy , 2012, Statistics in medicine.

[29]  M. Woodward,et al.  Risk prediction models: II. External validation, model updating, and impact assessment , 2012, Heart.

[30]  J. Ioannidis,et al.  External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. , 2015, Journal of clinical epidemiology.

[31]  Ian Roberts,et al.  Systematic review of prognostic models in traumatic brain injury , 2006, BMC Medical Informatics Decis. Mak..

[32]  M. Pencina,et al.  General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study , 2008, Circulation.

[33]  Gary S Collins,et al.  Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement. , 2015, European urology.

[34]  Harald Binder,et al.  Assessment of survival prediction models based on microarray data , 2007, Bioinform..

[35]  D G Altman,et al.  What do we mean by validating a prognostic model? , 2000, Statistics in medicine.

[36]  Patrick Royston,et al.  The design of simulation studies in medical statistics , 2006, Statistics in medicine.

[37]  G. Collins,et al.  Identifying patients with undetected renal tract cancer in primary care: an independent and external validation of QCancer® (Renal) prediction model. , 2013, Cancer epidemiology.

[38]  Christine Holmberg,et al.  Barriers to routine risk-score use for healthy primary care patients: survey and qualitative study. , 2010, Archives of internal medicine.

[39]  Aziz Sheikh,et al.  Predicting risk of type 2 diabetes in England and Wales: prospective derivation and validation of QDScore , 2009, BMJ : British Medical Journal.

[40]  Yvonne Vergouwe,et al.  Prognosis and prognostic research: validating a prognostic model , 2009, BMJ : British Medical Journal.

[41]  Gary S Collins,et al.  An independent and external validation of QRISK2 cardiovascular disease risk score: a prospective open cohort study , 2010, BMJ : British Medical Journal.

[42]  Gary S Collins,et al.  Predicting the 10 year risk of cardiovascular disease in the United Kingdom: independent and external validation of an updated version of QRISK2 , 2012, BMJ : British Medical Journal.

[43]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[44]  Yvonne Vergouwe,et al.  Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. , 2005, Journal of clinical epidemiology.

[45]  G. Collins,et al.  Identifying women with undetected ovarian cancer: independent and external validation of QCancer(®) (Ovarian) prediction model. , 2013, European journal of cancer care.

[46]  P. Donnan,et al.  Identifying suspected breast cancer: development and validation of a clinical prediction rule. , 2011, The British journal of general practice : the journal of the Royal College of General Practitioners.

[47]  James E. Helmreich Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition) , 2016 .

[48]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[49]  G. Collins,et al.  Identifying patients with undetected colorectal cancer: an independent validation of QCancer (Colorectal) , 2012, British Journal of Cancer.

[50]  G. Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement , 2015, Annals of Internal Medicine.

[51]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[52]  Gary Collins,et al.  Predicting the risk of chronic kidney disease in the UK: an evaluation of QKidney® scores using a primary care database. , 2012, The British journal of general practice : the journal of the Royal College of General Practitioners.

[53]  Karel G M Moons,et al.  A new framework to enhance the interpretation of external validation studies of clinical prediction models. , 2015, Journal of clinical epidemiology.

[54]  Gary S Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration , 2015, Annals of Internal Medicine.

[55]  E. Steyerberg,et al.  Prognosis Research Strategy (PROGRESS) 3: Prognostic Model Research , 2013, PLoS medicine.

[56]  Hans C. van Houwelingen,et al.  Validation, calibration, revision and combination of prognostic survival models , 2000 .