On weighting approaches for missing data

We review the class of inverse probability weighting (IPW) approaches for the analysis of missing data under various missing data patterns and mechanisms. The IPW methods rely on the intuitive idea of creating a pseudo-population of weighted copies of the complete cases to remove selection bias introduced by the missing data. However, different weighting approaches are required depending on the missing data pattern and mechanism. We begin with a uniform missing data pattern (i.e. a scalar missing indicator indicating whether or not the full data is observed) to motivate the approach. We then generalise to more complex settings. Our goal is to provide a conceptual overview of existing IPW approaches and illustrate the connections and differences among these approaches.

[1]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[2]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[3]  R D Gill,et al.  Non-response models for the analysis of non-monotone ignorable missing data. , 1997, Statistics in medicine.

[4]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[5]  N M Laird,et al.  Maximum likelihood analysis of generalized linear models with missing covariates , 1999, Statistical methods in medical research.

[6]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.

[7]  J. Robins,et al.  Efficiency Comparisons in Multivariate Multiple Regression with Missing Outcomes , 1997 .

[8]  J. Robins,et al.  Analysis of semi-parametric regression models with non-ignorable non-response. , 1997, Statistics in medicine.

[9]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[10]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[11]  James M. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models: Rejoinder , 1999 .

[12]  T. Therneau,et al.  An Introduction to Recursive Partitioning Using the RPART Routines , 2015 .

[13]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[14]  J. Robins,et al.  Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. , 2000, Epidemiology.

[15]  Joseph G. Ibrahim,et al.  Bayesian Methods for Missing Covariates in Cure Rate Models , 2002, Lifetime data analysis.

[16]  J. Robins,et al.  Semiparametric regression estimation in the presence of dependent censoring , 1995 .

[17]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[18]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[19]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[20]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[21]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[22]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  James M. Robins,et al.  Coarsening at Random: Characterizations, Conjectures, Counter-Examples , 1997 .

[25]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[26]  Joseph G. Ibrahim,et al.  Missing data methods in longitudinal studies: a review , 2009 .

[27]  J. Robins,et al.  Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models , 2000 .

[28]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[29]  Trivellore E Raghunathan,et al.  What do we do with missing data? Some options for analysis of incomplete data. , 2004, Annual review of public health.

[30]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[31]  Vic Hasselblad,et al.  Can one assess whether missing data are missing at random in medical studies? , 2006, Statistical methods in medical research.

[32]  J M Robins,et al.  Non-response models for the analysis of non-monotone non-ignorable missing data. , 1997, Statistics in medicine.

[33]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[34]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[35]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[36]  Peter J. Bickel,et al.  INFERENCE FOR SEMIPARAMETRIC MODELS: SOME QUESTIONS AND AN ANSWER , 2001 .

[37]  J. Ibrahim,et al.  Power prior distributions for regression models , 2000 .

[38]  Andrea Rotnitzky,et al.  Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. , 2007, Biometrika.

[39]  Trevor Hastie,et al.  Additive Logistic Regression : a Statistical , 1998 .

[40]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[41]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[42]  Semi-parametric estimation of models for the means and covariances in the presence of missing data , 1995 .

[43]  Joseph G. Ibrahim,et al.  Bayesian methods for generalized linear models with covariates missing at random , 2002 .