Methods for Handling Missing Variables in Risk Prediction Models.

Prediction models should be externally validated before being used in clinical practice. Many published prediction models have never been validated. Uncollected predictor variables in otherwise suitable validation cohorts are the main factor precluding external validation. We used individual patient data from 9 different cohort studies conducted in the United States, Europe, and Latin America that included 7,892 patients with chronic obstructive pulmonary disease who enrolled between 1981 and 2006. Data on 3-year mortality and the predictors of age, dyspnea, and airflow obstruction were available. We simulated missing data by omitting the predictor dyspnea cohort-wide, and we present 6 methods for handling the missing variable. We assessed model performance with regard to discriminative ability and calibration and by using 2 vignette scenarios. We showed that the use of any imputation method outperforms the omission of the cohort from the validation, which is a commonly used approach. Compared with using the full data set without the missing variable (benchmark), multiple imputation with fixed or random intercepts for cohorts was the best approach to impute the systematically missing predictor. Findings of this study may facilitate the use of cohort studies that do not include all predictors and pave the way for more widespread external validation of prediction models even if 1 or more predictors of the model are systematically missing.

[1]  G. Collins,et al.  External validation of multivariable prediction models: a systematic review of methodological conduct and reporting , 2014, BMC Medical Research Methodology.

[2]  The Copenhagen City Heart Study. Osterbroundersøgelsen. A book of tables with data from the first examination (1976-78) and a five year follow-up (1981-83). The Copenhagen City Heart Study Group. , 1989, Scandinavian journal of social medicine. Supplementum.

[3]  G. Janossy,et al.  Local immunodiagnosis of pulmonary TB: ELISPOT or flow cytometry, PPD or ESAT-6? , 2008, European Respiratory Journal.

[4]  Matthieu Resche-Rigon,et al.  Multiple imputation for handling systematically missing confounders in meta‐analysis of individual participant data , 2013, Statistics in medicine.

[5]  Amit G Singal,et al.  A Primer on Predictive Models , 2014, Clinical and Translational Gastroenterology.

[6]  Tempei Hashino,et al.  Sampling Uncertainty and Confidence Intervals for the Brier Score and Brier Skill Score , 2008 .

[7]  Francisco Herrera,et al.  Dealing with Missing Values , 2015 .

[8]  Richard D Riley,et al.  Developing and validating risk prediction models in an individual participant data meta-analysis , 2014, BMC Medical Research Methodology.

[9]  Jordi Alonso,et al.  Health-related quality of life and mortality in male patients with chronic obstructive pulmonary disease. , 2002, American journal of respiratory and critical care medicine.

[10]  E. W. Steyerberg Dealing with missing values , 2009 .

[11]  Karel G M Moons,et al.  Imputation of systematically missing predictors in an individual participant data meta‐analysis: a generalized approach using MICE , 2015, Statistics in medicine.

[12]  Gregory Evans,et al.  Laboratory, Reading Center, and Coordinating Center Data Management Methods in the Jackson Heart Study , 2004, The American journal of the medical sciences.

[13]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[14]  Stephen Burgess,et al.  Combining multiple imputation and meta-analysis with individual participant data , 2013, Statistics in medicine.

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  J. Garcia-Aymerich,et al.  Caracterización fenotípica y evolución de la EPOC en el estudio PAC-COPD: diseño y metodología , 2009 .

[17]  Ofer Harel,et al.  Inferences on missing information under multiple imputation and two-stage multiple imputation , 2007 .

[18]  Steven Piantadosi,et al.  A randomized trial comparing lung-volume-reduction surgery with medical therapy for severe emphysema. , 2003, The New England journal of medicine.

[19]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[20]  Ofer Harel,et al.  Addressing Missing Data Mechanism Uncertainty using Multiple-Model Multiple Imputation: Application to a Longitudinal Clinical Trial. , 2012, The annals of applied statistics.

[21]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[22]  Cesar G Victora,et al.  Chronic obstructive pulmonary disease in five Latin American cities (the PLATINO study): a prevalence study , 2005, The Lancet.

[23]  G. Moisen,et al.  PresenceAbsence: An R Package for Presence Absence Analysis , 2008 .

[24]  M. Puhan,et al.  The role of statins in primary prevention of cardiovascular disease. , 2007, Archives of internal medicine.

[25]  R. Little,et al.  The prevention and treatment of missing data in clinical trials. , 2012, The New England journal of medicine.

[26]  Ewout W. Steyerberg,et al.  Validation of Prediction Models , 2019, Statistics for Biology and Health.

[27]  R. Kronmal,et al.  The Cardiovascular Health Study: design and rationale. , 1991, Annals of epidemiology.

[28]  Yvonne Vergouwe,et al.  Development and validation of a prediction model with missing predictor data: a practical approach. , 2010, Journal of clinical epidemiology.

[29]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[30]  Jerome P. Reiter Selecting the number of imputed datasets when using multiple imputation for missing data and disclosure limitation , 2008 .

[31]  Judith Garcia-Aymerich,et al.  Phenotypic Characterization and Course of Chronic Obstructive Pulmonary Disease in the PAC-COPD Study: Design and Methods , 2009 .

[32]  Paul Enright,et al.  Large-scale international validation of the ADO index in subjects with COPD: an individual subject data analysis of 10 cohorts , 2012, BMJ Open.

[33]  Ulrike Held,et al.  Expansion of the prognostic assessment of patients with chronic obstructive pulmonary disease: the updated BODE index and the ADO index , 2009, The Lancet.

[34]  Joseph W Hogan,et al.  Standards should be applied in the prevention and handling of missing data for patient-centered outcomes research: a systematic review and expert consensus. , 2014, Journal of clinical epidemiology.

[35]  Robert D Gibbons,et al.  Multiple imputation for harmonizing longitudinal non‐commensurate measures in individual participant data meta‐analysis , 2015, Statistics in medicine.