American Journal of Epidemiology Practice of Epidemiology Multiple Imputation for Missing Data: Fully Conditional Specification versus Multivariate Normal Imputation

Statistical analysis in epidemiologic studies is often hindered by missing data, and multiple imputation is increasingly being used to handle this problem. In a simulation study, the authors compared 2 methods for imputation that are widely available in standard software: fully conditional specification (FCS) or "chained equations" and multivariate normal imputation (MVNI). The authors created data sets of 1,000 observations to simulate a cohort study, and missing data were induced under 3 missing-data mechanisms. Imputations were performed using FCS (Royston's "ice") and MVNI (Schafer's NORM) in Stata (Stata Corporation, College Station, Texas), with transformations or prediction matching being used to manage nonnormality in the continuous variables. Inferences for a set of regression parameters were compared between these approaches and a complete-case analysis. As expected, both FCS and MVNI were generally less biased than complete-case analysis, and both produced similar results despite the presence of binary and ordinal variables that clearly did not follow a normal distribution. Ignoring skewness in a continuous covariate led to large biases and poor coverage for the corresponding regression parameter under both approaches, although inferences for other parameters were largely unaffected. These results provide reassurance that similar results can be expected from FCS and MVNI in a standard regression analysis involving variously scaled variables.

[1]  Alan M. Zaslavsky,et al.  Using Calibration to Improve Rounding in Imputation , 2008 .

[2]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[3]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[4]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[5]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[6]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[7]  Siva Subramanian,et al.  Reducing psychosocial and behavioral pregnancy risk factors: results of a randomized clinical trial among high-risk pregnant african american women. , 2009, American journal of public health.

[8]  Elizabeth A Stuart,et al.  American Journal of Epidemiology Practice of Epidemiology Multiple Imputation with Large Data Sets: a Case Study of the Children's Mental Health Initiative , 2022 .

[9]  Patrick Royston,et al.  Multiple Imputation of Missing Values: New Features for Mim , 2009 .

[10]  John B. Carlin,et al.  INORM: Stata module to perform multiple imputation using Schafer's method , 2009 .

[11]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Update of Ice , 2005 .

[12]  Kyung-Hee Choi,et al.  The efficacy of female condom skills training in HIV risk reduction among women: a randomized controlled trial. , 2008, American journal of public health.

[13]  Oliver Rivero-Arias,et al.  Evaluation of software for multiple imputation of semi-continuous data , 2007, Statistical methods in medical research.

[14]  Joseph L Schafer,et al.  Robustness of a multivariate normal approximation for imputation of incomplete binary data , 2007, Statistics in medicine.

[15]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Further Update of Ice, with an Emphasis on Interval Censoring , 2007 .

[16]  Fei Yu,et al.  Estrogen receptor alpha and matrix metalloproteinase 2 polymorphisms and age-related maculopathy in older women. , 2008, American journal of epidemiology.

[17]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[18]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[19]  Hakan Demirtas,et al.  Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment , 2008 .

[20]  Patrick Royston,et al.  A New Framework for Managing and Analyzing Multiply Imputed Data in Stata , 2008 .

[21]  Dafydd Gibbon,et al.  1 User’s guide , 1998 .

[22]  Stephen R Cole,et al.  Use of multiple imputation in the epidemiologic literature. , 2008, American journal of epidemiology.

[23]  J. Schafer,et al.  Average causal effects from nonrandomized studies: a practical guide and simulated example. , 2008, Psychological methods.

[24]  Stef van Buuren,et al.  Multiple imputation of discrete and continuous data by fully conditional specification , 2007 .

[25]  Jaakko Nevalainen,et al.  Missing values in longitudinal dietary data: A multiple imputation approach based on a fully conditional specification , 2009, Statistics in medicine.

[26]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[27]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[28]  David S. Siscovick,et al.  A multiple-imputation analysis of a case-control study of the risk of primary cardiac arrest among pharmacologicallytreated hypertensives , 1996 .