Dealing with missing predictor values when applying clinical prediction models.

BACKGROUND Prediction models combine patient characteristics and test results to predict the presence of a disease or the occurrence of an event in the future. In the event that test results (predictor) are unavailable, a strategy is needed to help users applying a prediction model to deal with such missing values. We evaluated 6 strategies to deal with missing values. METHODS We developed and validated (in 1295 and 532 primary care patients, respectively) a prediction model to predict the risk of deep venous thrombosis. In an application set (259 patients), we mimicked 3 situations in which (1) an important predictor (D-dimer test), (2) a weaker predictor (difference in calf circumference), and (3) both predictors simultaneously were missing. The 6 strategies to deal with missing values were (1) ignoring the predictor, (2) overall mean imputation, (3) subgroup mean imputation, (4) multiple imputation, (5) applying a submodel including only the observed predictors as derived from the development set, or (6) the "one-step-sweep" method. We compared the model's discriminative ability (expressed by the ROC area) with the true ROC area (no missing values) and the model's estimated calibration slope and intercept with the ideal values of 1 and 0, respectively. RESULTS Ignoring the predictor led to the worst and multiple imputation to the best discrimination. Multiple imputation led to calibration intercepts closest to the true value. The effect of the strategies on the slope differed between the 3 scenarios. CONCLUSIONS Multiple imputation is preferred if a predictor value is missing.

[1]  Y Vergouwe,et al.  A new diagnostic rule for deep vein thrombosis: safety and efficiency in clinically relevant subgroups. , 2007, Family practice.

[2]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[3]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[4]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[5]  A. Hoes,et al.  Excluding deep vein thrombosis safely in primary care. , 2006, The Journal of family practice.

[6]  A. Evans,et al.  Translating Clinical Research into Clinical Practice: Impact of Using Prediction Rules To Make Decisions , 2006, Annals of Internal Medicine.

[7]  N. Donner‐Banzhoff Limited value of patient history and physical examination in diagnosing deep vein thrombosis in primary care. , 2005, Family practice.

[8]  Karel Moons,et al.  The Wells Rule Does Not Adequately Rule Out Deep Venous Thrombosis in Primary Care Patients , 2005, Annals of Internal Medicine.

[9]  Karel G M Moons,et al.  Ruling out deep venous thrombosis in primary care , 2005, Thrombosis and Haemostasis.

[10]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[11]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[12]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[13]  Guillermo Marshall,et al.  Prospective prediction in the presence of missing data , 2002, Statistics in medicine.

[14]  K. Moons,et al.  Predicting serious bacterial infection in young children with fever without apparent source , 2001, Acta paediatrica.

[15]  J. Habbema,et al.  Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. , 2000, Statistics in medicine.

[16]  D G Altman,et al.  What do we mean by validating a prognostic model? , 2000, Statistics in medicine.

[17]  K. Covinsky,et al.  Assessing the Generalizability of Prognostic Information , 1999, Annals of Internal Medicine.

[18]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[19]  J. Hoak,et al.  Management of deep vein thrombosis and pulmonary embolism. A statement for healthcare professionals. Council on Thrombosis (in consultation with the Council on Cardiovascular Radiology), American Heart Association. , 1996, Circulation.

[20]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[21]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[22]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[23]  C J McDonald,et al.  Validation of Probabilistic Predictions , 1993, Medical decision making : an international journal of the Society for Medical Decision Making.

[24]  R. Little Regression with Missing X's: A Review , 2011 .

[25]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[26]  R. Sugden Multiple Imputation for Nonresponse in Surveys , 1988 .

[27]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[28]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[29]  W. Kannel,et al.  A general cardiovascular risk profile: the Framingham Study. , 1976, The American journal of cardiology.

[30]  R. B. Cherry,et al.  Responsiveness and resuscitation of the newborn. The use of the Apgar score. , 1961, American journal of diseases of children.

[31]  D. Cox Two further applications of a model for binary regression , 1958 .