Estimating propensity scores with missing covariate data using general location mixture models.

In many observational studies, analysts estimate causal effects using propensity scores, e.g. by matching, sub-classifying, or inverse probability weighting based on the scores. Estimation of propensity scores is complicated when some values of the covariates are missing. Analysts can use multiple imputation to create completed data sets from which propensity scores can be estimated. We propose a general location mixture model for imputations that assumes that the control units are a latent mixture of (i) units whose covariates are drawn from the same distributions as the treated units' covariates and (ii) units whose covariates are drawn from different distributions. This formulation reduces the influence of control units outside the treated units' region of the covariate space on the estimation of parameters in the imputation model, which can result in more plausible imputations. In turn, this can result in more reliable estimates of propensity scores and better balance in the true covariate distributions when matching or sub-classifying. We illustrate the benefits of the latent class modeling approach with simulations and with an observational study of the effect of breast feeding on children's cognitive abilities.

[1]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[2]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[3]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[4]  Jerome P. Reiter,et al.  Estimation of propensity scores using generalized additive models , 2008, Statistics in medicine.

[5]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[6]  P. Auinger,et al.  Full Breastfeeding Duration and Associated Decrease in Respiratory Tract Infection in US Children , 2006, Pediatrics.

[7]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[8]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[9]  I. Deary,et al.  Effect of breast feeding on intelligence in children: prospective study, sibling pairs analysis, and meta-analysis , 2006, BMJ : British Medical Journal.

[10]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[11]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[12]  Thomas A Louis,et al.  Propensity score modeling strategies for the causal analysis of observational data. , 2002, Biostatistics.

[13]  P. Rosenbaum Sensitivity analysis for certain permutation inferences in matched observational studies , 1987 .

[14]  R. D'Agostino Adjustment Methods: Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a Non‐Randomized Control Group , 2005 .

[15]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[16]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[17]  P. Austin Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples , 2009, Statistics in medicine.

[18]  R. P. Wilder,et al.  Maternal Smoking During Pregnancy and Birthweight: A Propensity Score Matching Approach , 2008, Maternal and Child Health Journal.

[19]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[20]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[21]  P. Rosenbaum A Characterization of Optimal Designs for Observational Studies , 1991 .

[22]  Paul R Rosenbaum,et al.  Combining propensity score matching and group-based trajectory analysis in an observational study. , 2007, Psychological methods.

[23]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[24]  S. Hahn,et al.  Hepatic resection compared to percutaneous ethanol injection for small hepatocellular carcinoma using propensity score matching , 2007, Journal of gastroenterology and hepatology.

[25]  Jennifer Hill,et al.  Reducing Bias in Treatment Effect Estimation in Observational Studies Suffering from Missing Data , 2004 .

[26]  L. Wasserman,et al.  Asymptotic inference for mixture models by using data‐dependent priors , 2000 .

[27]  P. Austin The International Journal of Biostatistics Type I Error Rates , Coverage of Confidence Intervals , and Variance Estimation in Propensity-Score Matched Analyses , 2011 .

[28]  Jerome P. Reiter,et al.  A Comparison of Experimental and Observational Data Analyses , 2005 .

[29]  I. Lipkovich,et al.  Propensity score estimation with missing values using a multiple imputation missingness pattern (MIMP) approach , 2009, Statistics in medicine.

[30]  Andrew Gelman,et al.  Applied Bayesian Modeling And Causal Inference From Incomplete-Data Perspectives , 2005 .

[31]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[32]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[33]  D. Rubin,et al.  Estimating and Using Propensity Scores with Partially Missing Data , 2000 .

[34]  E. Stuart,et al.  Using full matching to estimate causal effects in nonexperimental studies: examining the relationship between adolescent marijuana use and adult outcomes. , 2008, Developmental psychology.

[35]  Peter C Austin,et al.  A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003 , 2008, Statistics in medicine.

[36]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[37]  G. Molenberghs,et al.  A Latent‐Class Mixture Model for Incomplete Longitudinal Gaussian Data , 2008, Biometrics.

[38]  Jerome P. Reiter,et al.  Interval estimation for treatment effects using propensity score matching , 2006, Statistics in medicine.

[39]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[40]  R. Horwitz The planning of observational studies of human populations , 1979 .

[41]  ARROLL,et al.  Estimation in Partially Linear Models With Missing Covariates , 2004 .