Development and validation of a prediction model with missing predictor data: a practical approach.

OBJECTIVE To illustrate the sequence of steps needed to develop and validate a clinical prediction model, when missing predictor values have been multiply imputed. STUDY DESIGN AND SETTING We used data from consecutive primary care patients suspected of deep venous thrombosis (DVT) to develop and validate a diagnostic model for the presence of DVT. Missing values were imputed 10 times with the MICE conditional imputation method. After the selection of predictors and transformations for continuous predictors according to three different methods, we estimated regression coefficients and performance measures. RESULTS The three methods to select predictors and transformations of continuous predictors showed similar results. Rubin's rules could easily be applied to estimate regression coefficients and performance measures, once predictors and transformations were selected. CONCLUSION We provide a practical approach for model development and validation with multiply imputed data.

[1]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[2]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[3]  S L Hui,et al.  Validation techniques for logistic regression models. , 1991, Statistics in medicine.

[4]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Update of Ice , 2005 .

[5]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[6]  I. R. White,et al.  Multiple imputation in practice , 2007 .

[7]  Karel G M Moons,et al.  Ruling out deep venous thrombosis in primary care , 2005, Thrombosis and Haemostasis.

[8]  J. Habbema,et al.  Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. , 2001, Journal of clinical epidemiology.

[9]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[10]  D G Altman,et al.  What do we mean by validating a prognostic model? , 2000, Statistics in medicine.

[11]  Donald B. Rubin,et al.  Performing likelihood ratio tests with multiply-imputed data sets , 1992 .

[12]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[13]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[14]  H C Van Houwelingen,et al.  Construction, validation and updating of a prognostic model for kidney graft survival. , 1995, Statistics in medicine.

[15]  A. Atkinson A note on the generalized information criterion for choice of a model , 1980 .

[16]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[17]  Willi Sauerbrei,et al.  The Use of Resampling Methods to Simplify Regression Models in Medical Statistics , 1999 .

[18]  D J Spiegelhalter,et al.  Probabilistic prediction in patient management and clinical trials. , 1986, Statistics in medicine.

[19]  Patrick Royston,et al.  How should variable selection be performed with multiply imputed data? , 2008, Statistics in medicine.

[20]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[21]  Thomas R Belin,et al.  Imputation and Variable Selection in Linear Regression Models with Missing Covariates , 2005, Biometrics.

[22]  K. Covinsky,et al.  Assessing the Generalizability of Prognostic Information , 1999, Annals of Internal Medicine.

[23]  Douglas G Altman,et al.  Dichotomizing continuous predictors in multiple regression: a bad idea , 2006, Statistics in medicine.

[24]  J. Habbema,et al.  Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. , 2000, Statistics in medicine.

[25]  P. Royston,et al.  Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials , 1999 .

[26]  Patrick Royston,et al.  Multivariable Model-Building: A Pragmatic Approach to Regression Analysis based on Fractional Polynomials for Modelling Continuous Variables , 2008 .

[27]  Y Vergouwe,et al.  A new diagnostic rule for deep vein thrombosis: safety and efficiency in clinically relevant subgroups. , 2007, Family practice.

[28]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[29]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[30]  P. Royston,et al.  Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. , 1994 .

[31]  A. Troxel,et al.  AN INDEX OF LOCAL SENSITIVITY TO NONIGNORABILITY , 2004 .

[32]  P. Royston,et al.  Building Multivariable Regression Models with Continuous Covariates in Clinical Epidemiology , 2005, Methods of Information in Medicine.

[33]  Nicholas J. Horton,et al.  Multiple Imputation in Practice , 2001 .

[34]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[35]  Willem van Mechelen,et al.  Variable selection under multiple imputation using the bootstrap in a prognostic study , 2007, BMC medical research methodology.

[36]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[37]  S. Nair,et al.  Predicting hospital mortality among injured children using a national trauma database. , 2006, The Journal of trauma.

[38]  Douglas G Altman,et al.  Developing a prognostic model in the presence of missing data: an ovarian cancer case study. , 2003, Journal of clinical epidemiology.

[39]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[40]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.