Quantifying the impact of different approaches for handling continuous predictors on the performance of a prognostic model

Continuous predictors are routinely encountered when developing a prognostic model. Investigators, who are often non‐statisticians, must decide how to handle continuous predictors in their models. Categorising continuous measurements into two or more categories has been widely discredited, yet is still frequently done because of its simplicity, investigator ignorance of the potential impact and of suitable alternatives, or to facilitate model uptake. We examine three broad approaches for handling continuous predictors on the performance of a prognostic model, including various methods of categorising predictors, modelling a linear relationship between the predictor and outcome and modelling a nonlinear relationship using fractional polynomials or restricted cubic splines. We compare the performance (measured by the c‐index, calibration and net benefit) of prognostic models built using each approach, evaluating them using separate data from that used to build them. We show that categorising continuous predictors produces models with poor predictive performance and poor clinical usefulness. Categorising continuous predictors is unnecessary, biologically implausible and inefficient and should not be used in prognostic model development. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

[1]  Patrick Royston,et al.  Reporting performance of prognostic models in cancer: a review , 2010, BMC medicine.

[2]  K. Metze Dichotomization of continuous data--a pitfall in prognostic factor studies. , 2008, Pathology, research and practice.

[3]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[4]  D G Altman,et al.  What do we mean by validating a prognostic model? , 2000, Statistics in medicine.

[5]  G. Collins,et al.  Identifying patients with undetected colorectal cancer: an independent validation of QCancer (Colorectal) , 2012, British Journal of Cancer.

[6]  F. Harrell,et al.  Regression models in clinical studies: determining relationships between predictors and response. , 1988, Journal of the National Cancer Institute.

[7]  Patrick Royston,et al.  Reporting methods in studies developing prognostic models in cancer: a review , 2010, BMC medicine.

[8]  C. Thomas,et al.  Relation between age, femoral neck cortical stability, and hip fracture risk , 2005, The Lancet.

[9]  Patrick Royston,et al.  Explained Variation for Survival Models , 2006 .

[10]  G. Collins,et al.  Identifying patients with undetected gastro-oesophageal cancer in primary care: External validation of QCancer® (Gastro-Oesophageal). , 2013, European journal of cancer.

[11]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[12]  G. S. Collins,et al.  External validation of QDSCORE® for predicting the 10‐year risk of developing Type 2 diabetes , 2011, Diabetic medicine : a journal of the British Diabetic Association.

[13]  Y. Vergouwe,et al.  Vital signs should be maintained as continuous variables when predicting bacterial infections in febrile children. , 2013, Journal of clinical epidemiology.

[14]  M. Kattan,et al.  Nonlinear modeling was applied thoughtfully for risk prediction: the Prostate Biopsy Collaborative Group. , 2015, Journal of clinical epidemiology.

[15]  Gary S Collins,et al.  Predicting the adverse risk of statin treatment: an independent and external validation of Qstatin risk scores in the UK , 2012, Heart.

[16]  G. S. Collins PhD Senior Medical Statistician Identifying women with undetected ovarian cancer: independent and external validation of QCancer® (Ovarian) prediction model , 2012 .

[17]  E. Elkin,et al.  Decision Curve Analysis: A Novel Method for Evaluating Prediction Models , 2006, Medical decision making : an international journal of the Society for Medical Decision Making.

[18]  G. Collins,et al.  Identifying women with undetected ovarian cancer: independent and external validation of QCancer(®) (Ovarian) prediction model. , 2013, European journal of cancer care.

[19]  G. Bedogni,et al.  Clinical Prediction Models—a Practical Approach to Development, Validation and Updating , 2009 .

[20]  G. Collins,et al.  Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting , 2011, BMC medicine.

[21]  R. Weiss,et al.  Dichotomizing Continuous Variables in Statistical Analysis , 2012, Medical decision making : an international journal of the Society for Medical Decision Making.

[22]  Joan K. Morris,et al.  Screening for Future Cardiovascular Disease Using Age Alone Compared with Multiple Risk Factors and Age , 2011, BDJ.

[23]  P. Royston,et al.  Selection of important variables and determination of functional form for continuous predictors in multivariable model building , 2007, Statistics in medicine.

[24]  P. Royston,et al.  MFP: Multivariable Model‐Building with Fractional Polynomials , 2008 .

[25]  Patrick Royston,et al.  The cost of dichotomising continuous variables , 2006, BMJ : British Medical Journal.

[26]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[27]  A. Vickers,et al.  Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents , 2012, BMC Medical Research Methodology.

[28]  Ralph B D'Agostino,et al.  Presentation of multivariate data for clinical use: The Framingham Study risk score functions. , 2005, Statistics in medicine.

[29]  G. Collins,et al.  External validation of multivariable prediction models: a systematic review of methodological conduct and reporting , 2014, BMC Medical Research Methodology.

[30]  Patrick Royston,et al.  A new measure of prognostic separation in survival data , 2004, Statistics in medicine.

[31]  Elena B. Elkin,et al.  Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers , 2008, BMC Medical Informatics Decis. Mak..

[32]  G. Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement , 2015, Annals of Internal Medicine.

[33]  Gary S Collins,et al.  Sample size considerations for the external validation of a multivariable prognostic model: a resampling study , 2015, Statistics in medicine.

[34]  Carol Coupland,et al.  Predicting risk of osteoporotic fracture in men and women in England and Wales: prospective derivation and validation of QFractureScores , 2009, BMJ : British Medical Journal.

[35]  David L Streiner,et al.  Breaking up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data , 2002, Canadian journal of psychiatry. Revue canadienne de psychiatrie.

[36]  E. Christensen,et al.  Prediction of outcome of pancreaticogastrostomy for pain in chronic pancreatitis. , 1987, Scandinavian journal of gastroenterology.

[37]  K. Covinsky,et al.  Assessing the Generalizability of Prognostic Information , 1999, Annals of Internal Medicine.

[38]  Douglas G Altman,et al.  Dichotomizing continuous predictors in multiple regression: a bad idea , 2006, Statistics in medicine.

[39]  G. Collins,et al.  Identifying patients with undetected renal tract cancer in primary care: an independent and external validation of QCancer® (Renal) prediction model. , 2013, Cancer epidemiology.

[40]  Ewout W Steyerberg,et al.  Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests , 2016, British Medical Journal.

[41]  M. Schumacher,et al.  Consistent Estimation of the Expected Brier Score in General Survival Models with Right‐Censored Event Times , 2006, Biometrical journal. Biometrische Zeitschrift.

[42]  Andrew J Vickers,et al.  Everything you always wanted to know about evaluating prediction models (but were too afraid to ask). , 2010, Urology.

[43]  A. Sheikh,et al.  Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2 , 2008, BMJ : British Medical Journal.

[44]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[45]  François Gueyffier,et al.  A score for predicting risk of death from cardiovascular disease in adults with raised blood pressure, based on individual patient data from randomised controlled trials , 2001, BMJ : British Medical Journal.

[46]  E. Steyerberg,et al.  Reporting and Methods in Clinical Prediction Research: A Systematic Review , 2012, PLoS medicine.

[47]  Gary S Collins,et al.  A systematic review finds prediction models for chronic kidney disease were poorly reported and often developed using inappropriate methods. , 2013, Journal of clinical epidemiology.

[48]  Gary S Collins,et al.  Predicting risk of osteoporotic and hip fracture in the United Kingdom: prospective independent and external validation of QFractureScores , 2011, BMJ : British Medical Journal.

[49]  W. Sauerbrei,et al.  Dangers of using "optimal" cutpoints in the evaluation of prognostic factors. , 1994, Journal of the National Cancer Institute.

[50]  Gary S Collins,et al.  Predicting the 10 year risk of cardiovascular disease in the United Kingdom: independent and external validation of an updated version of QRISK2 , 2012, BMJ : British Medical Journal.

[51]  P. Royston,et al.  Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. , 1994 .

[52]  Gary Collins,et al.  Predicting the risk of chronic kidney disease in the UK: an evaluation of QKidney® scores using a primary care database. , 2012, The British journal of general practice : the journal of the Royal College of General Practitioners.

[53]  Patrick Royston,et al.  Multivariable regression model building by using fractional polynomials: Description of SAS, STATA and R programs , 2006, Computational Statistics & Data Analysis.

[54]  Gary S Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration , 2015, Annals of Internal Medicine.

[55]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.