Combining Multiple Imputation and Inverse-Probability Weighting

Summary Two approaches commonly used to deal with missing data are multiple imputation (MI) and inverse-probability weighting (IPW). IPW is also used to adjust for unequal sampling fractions. MI is generally more efficient than IPW but more complex. Whereas IPW requires only a model for the probability that an individual has complete data (a univariate outcome), MI needs a model for the joint distribution of the missing data (a multivariate outcome) given the observed data. Inadequacies in either model may lead to important bias if large amounts of data are missing. A third approach combines MI and IPW to give a doubly robust estimator. A fourth approach (IPW/MI) combines MI and IPW but, unlike doubly robust methods, imputes only isolated missing values and uses weights to account for remaining larger blocks of unimputed missing data, such as would arise, e.g., in a cohort study subject to sample attrition, and/or unequal sampling fractions. In this article, we examine the performance, in terms of bias and efficiency, of IPW/MI relative to MI and IPW alone and investigate whether the Rubin’s rules variance estimator is valid for IPW/MI. We prove that the Rubin’s rules variance estimator is valid for IPW/MI for linear regression with an imputed outcome, we present simulations supporting the use of this variance estimator in more general settings, and we demonstrate that IPW/MI can have advantages over alternatives. IPW/MI is applied to data from the National Child Development Study.

[1]  Nathaniel Schenker,et al.  Asymptotic results for multiple imputation , 1988 .

[2]  Harvey Goldstein,et al.  Handling attrition and non-response in longitudinal data , 2009 .

[3]  Michael P. Jones Indicator and stratification methods for missing explanatory variables in multiple linear regression , 1996 .

[4]  John B. Carlin,et al.  Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values , 2010, Statistics in medicine.

[5]  J. Schafer Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ , 2003 .

[6]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[7]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[8]  D. Strachan,et al.  Loss and representativeness in a biomedical survey at age 45 years: 1958 British birth cohort , 2008, Journal of Epidemiology & Community Health.

[9]  S. Stansfeld,et al.  Lifecourse socioeconomic predictors of midlife drinking patterns, problems and abstention: findings from the 1958 British Birth Cohort Study. , 2008, Drug and alcohol dependence.

[10]  E. Hyppönen,et al.  Prenatal Exposures and Glucose Metabolism in Adulthood , 2007, Diabetes Care.

[11]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Update of Ice , 2005 .

[12]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[13]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[14]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[15]  S. Stansfeld,et al.  Psychosocial work characteristics and anxiety and depressive disorders in midlife: the effects of prior psychological distress , 2008, Occupational and Environmental Medicine.

[16]  M. Höfler,et al.  The use of weights to account for non-response and drop-out , 2005, Social Psychiatry and Psychiatric Epidemiology.

[17]  C. Power,et al.  Cohort profile: 1958 British birth cohort (National Child Development Study). , 2006, International journal of epidemiology.

[18]  M. Kenward,et al.  Analysis of Incomplete Data Using Inverse Probability Weighting and Doubly Robust Estimators , 2010 .

[19]  Wayne A. Fuller,et al.  On the bias of the multiple‐imputation variance estimator in survey sampling , 2006 .

[20]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[21]  Søren Feodor Nielsen,et al.  Proper and Improper Multiple Imputation , 2003 .

[22]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[23]  P. Bebbington,et al.  Characteristics of teams, staff and patients: associations with outcomes of patients in assertive outreach , 2004, British Journal of Psychiatry.

[24]  S. Stansfeld,et al.  Childhood and adulthood socio-economic position and midlife depressive and anxiety disorders , 2008, British Journal of Psychiatry.

[25]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[26]  James M. Robins,et al.  Large-sample theory for parametric multiple imputation procedures , 1998 .

[27]  R D Gill,et al.  Non-response models for the analysis of non-monotone ignorable missing data. , 1997, Statistics in medicine.

[28]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .