Introduction to multiple imputation for dealing with missing data

Missing data are common in both observational and experimental studies. Multiple imputation (MI) is a two‐stage approach where missing values are imputed a number of times using a statistical model based on the available data and then inference is combined across the completed datasets. This approach is becoming increasingly popular for handling missing data. In this paper, we introduce the method of MI, as well as a discussion surrounding when MI can be a useful method for handling missing data and the drawbacks of this approach. We illustrate MI when exploring the association between current asthma status and forced expiratory volume in 1 s after adjustment for potential confounders using data from a population‐based longitudinal cohort study.

[1]  K. Seaton,et al.  Rounding non-binary categorical variables following multivariate normal imputation: evaluation of simple methods and implications for practice , 2014 .

[2]  R. Wolfe,et al.  Modern statistical methods in respiratory medicine , 2014, Respirology.

[3]  J. Kasza,et al.  Interpretation of commonly used statistical regression models , 2014, Respirology.

[4]  Jeffrey E. Jarrett Book Reviews , 2013 .

[5]  Katherine J. Lee,et al.  The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study , 2013, Emerging Themes in Epidemiology.

[6]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[7]  Paul T. von Hippel,et al.  Should a Normal Imputation Model be Modified to Impute Skewed Variables , 2013, 1707.05360.

[8]  Michael G. Kenward,et al.  Multiple Imputation and its Application: Carpenter/Multiple Imputation and its Application , 2013 .

[9]  Katherine J. Lee,et al.  Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non‐linear effects in a large cohort study , 2012, Statistics in medicine.

[10]  John B Carlin,et al.  Recovery of information from multiple imputation: a simulation study , 2012, Emerging Themes in Epidemiology.

[11]  John W. Graham,et al.  Missing Data: Analysis and Design , 2012 .

[12]  John B. Carlin,et al.  Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values , 2010, Statistics in medicine.

[13]  A. Mackinnon The use and reporting of multiple imputation in medical research – a review , 2010, Journal of internal medicine.

[14]  Douglas G Altman,et al.  Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study , 2010, BMC medical research methodology.

[15]  Patrick Royston,et al.  Multiple Imputation of Missing Values: New Features for Mim , 2009 .

[16]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[17]  L. Hunt,et al.  Missing Data in Clinical Studies , 2007 .

[18]  E. Elm,et al.  The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies , 2007, The Lancet.

[19]  S. Pocock,et al.  Strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies , 2007, BMJ : British Medical Journal.

[20]  Andrew Gelman,et al.  Diagnostics for multivariate imputations , 2007 .

[21]  James R Carpenter,et al.  Sensitivity analysis after multiple imputation under missing at random: a weighting approach , 2007, Statistical methods in medical research.

[22]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[23]  G. Giles,et al.  Tracing 8,600 participants 36 years after recruitment at age seven for the Tasmanian Asthma Study , 2006, Australian and New Zealand journal of public health.

[24]  K. Vermeulen,et al.  Incomplete quality of life data in lung transplant research: comparing cross sectional, repeated measures ANOVA, and multi-level analysis , 2005, Respiratory research.

[25]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[26]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[27]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[28]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[29]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[30]  Sabrina Eberhart,et al.  Applied Missing Data Analysis , 2016 .

[31]  S. Pocock,et al.  [The Strengthening the Reporting of Observational Studies in Epidemiology [STROBE] statement: guidelines for reporting observational studies]. , 2008, Gaceta sanitaria.

[32]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[33]  D. Grimes Comparison of methods , 2000 .

[34]  John B Carlin,et al.  American Journal of Epidemiology Practice of Epidemiology Multiple Imputation for Missing Data: Fully Conditional Specification versus Multivariate Normal Imputation , 2022 .