Poor performance of clinical prediction models: the harm of commonly applied methods.

OBJECTIVE To evaluate limitations of common statistical modeling approaches in deriving clinical prediction models and explore alternative strategies. STUDY DESIGN AND SETTING A previously published model predicted the likelihood of having a mutation in germline DNA mismatch repair genes at the time of diagnosis of colorectal cancer. This model was based on a cohort where 38 mutations were found among 870 participants, with validation in an independent cohort with 35 mutations. The modeling strategy included stepwise selection of predictors from a pool of over 37 candidate predictors and dichotomization of continuous predictors. We simulated this strategy in small subsets of a large contemporary cohort (2,051 mutations among 19,866 participants) and made comparisons to other modeling approaches. All models were evaluated according to bias and discriminative ability (concordance index, c) in independent data. RESULTS We found over 50% bias for five of six originally selected predictors, unstable model specification, and poor performance at validation (median c = 0.74). A small validation sample hampered stable assessment of performance. Model prespecification based on external knowledge and using continuous predictors led to better performance (c = 0.836 and c = 0.852 with 38 and 2,051 events respectively). CONCLUSION Prediction models perform poorly if based on small numbers of events and developed with common but suboptimal statistical approaches. Alternative modeling strategies to best exploit available predictive information need wider implementation, with collaborative research to increase sample sizes.

[1]  M. S. Rahman,et al.  Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data , 2017, BMC Medical Research Methodology.

[2]  G W Sun,et al.  Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. , 1996, Journal of clinical epidemiology.

[3]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[4]  Yvonne Vergouwe,et al.  Towards better clinical prediction models: seven steps for development and an ABCD for validation. , 2014, European heart journal.

[5]  Gary S Collins,et al.  Quantifying the impact of different approaches for handling continuous predictors on the performance of a prognostic model , 2016, Statistics in medicine.

[6]  Gareth Ambler,et al.  Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events , 2015, Statistics in medicine.

[7]  P. Royston,et al.  Selection of important variables and determination of functional form for continuous predictors in multivariable model building , 2007, Statistics in medicine.

[8]  D Timmerman,et al.  Flawed external validation study of the ADNEX model to diagnose ovarian cancer , 2016, Gynecologic oncology reports.

[9]  Sander Greenland,et al.  Bayesian perspectives for epidemiological research. II. Regression analysis. , 2007, International journal of epidemiology.

[10]  Gary S Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration , 2015, Annals of Internal Medicine.

[11]  Ewout W Steyerberg,et al.  Development and Validation of the PREMM5 Model for Comprehensive Risk Assessment of Lynch Syndrome. , 2017, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  Yvonne Vergouwe,et al.  Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. , 2005, Journal of clinical epidemiology.

[13]  Dirk Timmerman,et al.  Clinical Utility of Risk Models to Refer Patients with Adnexal Masses to Specialized Oncology Care: Multicenter External Validation Using Decision Curve Analysis , 2017, Clinical Cancer Research.

[14]  Ewout W Steyerberg,et al.  Prediction of MLH1 and MSH2 mutations in Lynch syndrome. , 2006, JAMA.

[15]  E W Steyerberg,et al.  Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. , 1999, Journal of clinical epidemiology.

[16]  Ewout W Steyerberg,et al.  Comparison of Prediction Models for Lynch Syndrome Among Individuals With Colorectal Cancer. , 2016, Journal of the National Cancer Institute.

[17]  J A Knottnerus,et al.  The Diagnostic Value of Scoring Models for Organic and Non-organic Gastrointestinal Disease, Including the Irritable-bowel Syndrome , 1994, Medical decision making : an international journal of the Society for Medical Decision Making.

[18]  A. Evans,et al.  Translating Clinical Research into Clinical Practice: Impact of Using Prediction Rules To Make Decisions , 2006, Annals of Internal Medicine.

[19]  J. Habbema,et al.  Prognostic Modeling with Logistic Regression Analysis , 2001, Medical decision making : an international journal of the Society for Medical Decision Making.

[20]  Ewout W Steyerberg,et al.  The PREMM(1,2,6) model predicts risk of MLH1, MSH2, and MSH6 germline mutations based on cancer history. , 2011, Gastroenterology.

[21]  Ewout W Steyerberg,et al.  Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints , 2014, BMC Medical Research Methodology.

[22]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[23]  Ewout W Steyerberg,et al.  Internal and external validation of predictive models: a simulation study of bias and precision in small samples. , 2003, Journal of clinical epidemiology.

[24]  Frank E. Harrell,et al.  Prediction models need appropriate internal, internal-external, and external validation. , 2016, Journal of clinical epidemiology.

[25]  Andreas Stang,et al.  Statistical inference in abstracts of major medical and epidemiology journals 1975–2014: a systematic review , 2016, European Journal of Epidemiology.

[26]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[27]  J Col,et al.  Predictors of 30-day mortality in the era of reperfusion for acute myocardial infarction. Results from an international trial of 41,021 patients. GUSTO-I Investigators. , 1995, Circulation.

[28]  Chava L. Ramspek,et al.  Con: Most clinical risk scores are useless. , 2017, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.

[29]  Randall W Burt,et al.  ACG Clinical Guideline: Genetic Testing and Management of Hereditary Gastrointestinal Cancer Syndromes , 2015, The American Journal of Gastroenterology.

[30]  Patrick Royston,et al.  The cost of dichotomising continuous variables , 2006, BMJ : British Medical Journal.

[31]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[32]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.

[33]  D. Altman,et al.  Bootstrap investigation of the stability of a Cox regression model. , 1989, Statistics in medicine.

[34]  J. Knottnerus,et al.  Assessment of the accuracy of diagnostic tests: the cross-sectional study. , 2003, Journal of clinical epidemiology.

[35]  Vanya Van Belle,et al.  Visualizing Risk Prediction Models , 2015 .

[36]  Gary S Collins,et al.  A systematic review finds prediction models for chronic kidney disease were poorly reported and often developed using inappropriate methods. , 2013, Journal of clinical epidemiology.

[37]  J. Habbema,et al.  Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. , 2000, Statistics in medicine.

[38]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[39]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[40]  Douglas G Altman,et al.  Prognostic Models: A Methodological Framework and Review of Models for Breast Cancer , 2009, Cancer investigation.

[41]  Gary S Collins,et al.  Sample size considerations for the external validation of a multivariable prognostic model: a resampling study , 2015, Statistics in medicine.

[42]  Richard D Riley,et al.  External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges , 2016, BMJ.

[43]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[44]  John P. A. Ioannidis,et al.  How to Make More Published Research True , 2014, PLoS medicine.

[45]  B. van Calster,et al.  Calibration of Risk Prediction Models , 2015, Medical decision making : an international journal of the Society for Medical Decision Making.

[46]  Yvonne Vergouwe,et al.  A calibration hierarchy for risk models was defined: from utopia to empirical data. , 2016, Journal of clinical epidemiology.

[47]  Patrick Royston,et al.  Reporting methods in studies developing prognostic models in cancer: a review , 2010, BMC medicine.

[48]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[49]  Susan Halabi,et al.  American Joint Committee on Cancer acceptance criteria for inclusion of risk models for individualized prognosis in the practice of precision medicine , 2016, CA: a cancer journal for clinicians.

[50]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[51]  G. Collins,et al.  External validation of multivariable prediction models: a systematic review of methodological conduct and reporting , 2014, BMC Medical Research Methodology.

[52]  G. Collins,et al.  Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting , 2011, BMC medicine.

[53]  Randall W Burt,et al.  Guidelines on genetic evaluation and management of Lynch syndrome: a consensus statement by the US Multi-Society Task Force on colorectal cancer. , 2014, Gastroenterology.

[54]  Harry Campbell,et al.  Identification and survival of carriers of mutations in DNA mismatch-repair genes in colon cancer. , 2006, The New England journal of medicine.

[55]  Ewout W Steyerberg,et al.  A systematic review finds methodological improvements necessary for prognostic models in determining traumatic brain injury outcomes. , 2008, Journal of clinical epidemiology.

[56]  D. Altman,et al.  Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable: An Example from the Natural History of Unruptured Aneurysms , 2011, American Journal of Neuroradiology.

[57]  Peter C Austin,et al.  Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. , 2004, Journal of clinical epidemiology.

[58]  G. McClelland,et al.  Negative Consequences of Dichotomizing Continuous Predictor Variables , 2003 .

[59]  G. Collins,et al.  Prediction models for cardiovascular disease risk in the general population: systematic review , 2016, British Medical Journal.

[60]  Karel G M Moons,et al.  A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta‐analysis , 2013, Statistics in medicine.

[61]  J. Ioannidis Why Most Discovered True Associations Are Inflated , 2008, Epidemiology.

[62]  Karel G M Moons,et al.  A new framework to enhance the interpretation of external validation studies of clinical prediction models. , 2015, Journal of clinical epidemiology.

[63]  A Rogier T Donders,et al.  Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. , 2004, Journal of clinical epidemiology.

[64]  R. Weiss,et al.  Dichotomizing Continuous Variables in Statistical Analysis , 2012, Medical decision making : an international journal of the Society for Medical Decision Making.

[65]  S. Goodman,et al.  Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations , 2016, European Journal of Epidemiology.

[66]  Ewout W Steyerberg,et al.  Data reduction for prediction: A case study on robust coding of age and family history for the risk of having a genetic mutation , 2007, Statistics in medicine.

[67]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[68]  J. Ioannidis,et al.  External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. , 2015, Journal of clinical epidemiology.

[69]  Michael A Babyak,et al.  What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models , 2004, Psychosomatic medicine.

[70]  Douglas G Altman,et al.  Dichotomizing continuous predictors in multiple regression: a bad idea , 2006, Statistics in medicine.

[71]  E. Steyerberg,et al.  Reporting and Methods in Clinical Prediction Research: A Systematic Review , 2012, PLoS medicine.