Predictor characteristics necessary for building a clinically useful risk prediction model: a simulation study

BackgroundCompelled by the intuitive appeal of predicting each individual patient’s risk of an outcome, there is a growing interest in risk prediction models. While the statistical methods used to build prediction models are increasingly well understood, the literature offers little insight to researchers seeking to gauge a priori whether a prediction model is likely to perform well for their particular research question. The objective of this study was to inform the development of new risk prediction models by evaluating model performance under a wide range of predictor characteristics.MethodsData from all births to overweight or obese women in British Columbia, Canada from 2004 to 2012 (n = 75,225) were used to build a risk prediction model for preeclampsia. The data were then augmented with simulated predictors of the outcome with pre-set prevalence values and univariable odds ratios. We built 120 risk prediction models that included known demographic and clinical predictors, and one, three, or five of the simulated variables. Finally, we evaluated standard model performance criteria (discrimination, risk stratification capacity, calibration, and Nagelkerke’s r2) for each model.ResultsFindings from our models built with simulated predictors demonstrated the predictor characteristics required for a risk prediction model to adequately discriminate cases from non-cases and to adequately classify patients into clinically distinct risk groups. Several predictor characteristics can yield well performing risk prediction models; however, these characteristics are not typical of predictor-outcome relationships in many population-based or clinical data sets. Novel predictors must be both strongly associated with the outcome and prevalent in the population to be useful for clinical prediction modeling (e.g., one predictor with prevalence ≥20 % and odds ratio ≥8, or 3 predictors with prevalence ≥10 % and odds ratios ≥4). Area under the receiver operating characteristic curve values of >0.8 were necessary to achieve reasonable risk stratification capacity.ConclusionsOur findings provide a guide for researchers to estimate the expected performance of a prediction model before a model has been built based on the characteristics of available predictors.

[1]  D. Charnock-Jones,et al.  Redefining preeclampsia using placenta-derived biomarkers. , 2013, Hypertension.

[2]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[3]  Kypros H Nicolaides,et al.  An integrated model for the prediction of preeclampsia using maternal factors and uterine artery Doppler velocimetry in unselected low-risk women. , 2005, American journal of obstetrics and gynecology.

[4]  James J. Walker,et al.  Pre-eclampsia , 2000, The Lancet.

[5]  Dewesh Agrawal,et al.  Clinical prediction rule for identifying children with cerebrospinal fluid pleocytosis at very low risk of bacterial meningitis. , 2007, JAMA.

[6]  M. Pepe,et al.  Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. , 2004, American journal of epidemiology.

[7]  D. Altman,et al.  Diagnostic tests 4: likelihood ratios , 2004, BMJ : British Medical Journal.

[8]  E. Cook,et al.  A prediction model for superimposed preeclampsia in women with chronic hypertension during pregnancy. , 2004, American journal of obstetrics and gynecology.

[9]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.

[10]  G. Saade,et al.  First-Trimester Prediction of Preeclampsia in Nulliparous Women at Low Risk , 2012, Obstetrics and gynecology.

[11]  Lily S Lee,et al.  Validating the British Columbia Perinatal Data Registry: a chart re-abstraction study , 2015, BMC Pregnancy and Childbirth.

[12]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[13]  Kypros H Nicolaides,et al.  First-Trimester Prediction of Hypertensive Disorders in Pregnancy , 2009, Hypertension.

[14]  J. Kingdom,et al.  Two-dimensional sonographic assessment of maximum placental length and thickness in the second trimester: a reproducibility study , 2015, The journal of maternal-fetal & neonatal medicine : the official journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians.

[15]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[16]  E. Lieberman,et al.  Risk of Adverse Pregnancy Outcomes by Prepregnancy Body Mass Index: A Population-Based Study to Inform Prepregnancy Weight Loss Counseling , 2015, Obstetrics and gynecology.

[17]  R. Ota,et al.  Predictive modeling: potential application in prevention services. , 2015, American journal of preventive medicine.

[18]  B. Sheu,et al.  Mid‐trimester β‐hCG levels incorporated in a multifactorial model for the prediction of severe pre‐eclampsia , 2000 .

[19]  P. Walsh,et al.  Pathologic and clinical findings to predict tumor extent of nonpalpable (stage T1c) prostate cancer. , 1994, JAMA.

[20]  N. Cook Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction , 2007, Circulation.

[21]  J. Dungan Antiplatelet agents for prevention of pre-eclampsia: a meta-analysis of individual patient data , 2008 .