On Inverse Probability Weighting for Nonmonotone Missing at Random Data

ABSTRACT The development of coherent missing data models to account for nonmonotone missing at random (MAR) data by inverse probability weighting (IPW) remains to date largely unresolved. As a consequence, IPW has essentially been restricted for use only in monotone MAR settings. We propose a class of models for nonmonotone missing data mechanisms that spans the MAR model, while allowing the underlying full data law to remain unrestricted. For parametric specifications within the proposed class, we introduce an unconstrained maximum likelihood estimator for estimating the missing data probabilities which is easily implemented using existing software. To circumvent potential convergence issues with this procedure, we also introduce a constrained Bayesian approach to estimate the missing data process which is guaranteed to yield inferences that respect all model restrictions. The efficiency of standard IPW estimation is improved by incorporating information from incomplete cases through an augmented estimating equation which is optimal within a large class of estimating equations. We investigate the finite-sample properties of the proposed estimators in extensive simulations and illustrate the new methodology in an application evaluating key correlates of preterm delivery for infants born to HIV-infected mothers in Botswana, Africa. Supplementary materials for this article are available online.

[1]  I. White,et al.  Review of inverse probability weighting for dealing with missing data , 2013, Statistical methods in medical research.

[2]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[3]  Shuangge Ma Book review: Tsiatis, A.A. 2006: Semiparametric Theory and Missing Data. Springer , 2008 .

[4]  Xiao-Hua Zhou,et al.  Doubly Robust Estimates for Binary Longitudinal Data Analysis with Missing Response and Missing Covariates , 2011, Biometrics.

[5]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[6]  Roderick J. A. Little,et al.  Subsample ignorable likelihood for regression analysis with missing data , 2011 .

[7]  Dan Jackson,et al.  What Is Meant by "Missing at Random"? , 2013, 1306.2812.

[8]  R D Gill,et al.  Non-response models for the analysis of non-monotone ignorable missing data. , 1997, Statistics in medicine.

[9]  Donald B. Rubin,et al.  Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 1986 .

[10]  W. Gilks,et al.  Adaptive Rejection Metropolis Sampling Within Gibbs Sampling , 1995 .

[11]  C. Jacklin The first year of life: The Collaborative Perinatal Project of the National Institute of Neurological and Communicative Disorders and Stroke. , 1980 .

[12]  W. Newey,et al.  16 Efficient estimation of models with conditional moment restrictions , 1993 .

[13]  Richard J. Cook,et al.  Weighted Generalized Estimating Functions for Longitudinal Response and Covariate Data That Are Missing at Random , 2010 .

[14]  Stuart R. Lipsitz,et al.  Marginal models for the analysis of longitudinal measurements with nonignorable non-monotone missing data , 1998 .

[15]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[16]  O Siddiqui,et al.  A comparison of the random-effects pattern mixture model with last-observation-carried-forward (LOCF) analysis in longitudinal clinical trials with dropouts. , 1998, Journal of biopharmaceutical statistics.

[17]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[18]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[19]  R. Little,et al.  Does Weighting for Nonresponse Increase the Variance of Survey Means? (Conference Paper) , 2004 .

[20]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[21]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[22]  J. Ibrahim,et al.  Power prior distributions for regression models , 2000 .

[23]  James M. Robins,et al.  Coarsening at Random: Characterizations, Conjectures, Counter-Examples , 1997 .

[24]  N M Laird,et al.  Maximum likelihood analysis of generalized linear models with missing covariates , 1999, Statistical methods in medical research.

[25]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[26]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[27]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[28]  Haitao Chu,et al.  Estimation of Risk Ratios in Cohort Studies With Common Outcomes: A Bayesian Approach , 2010, Epidemiology.

[29]  Joseph G. Ibrahim,et al.  Bayesian methods for generalized linear models with covariates missing at random , 2002 .

[30]  Donald B. Rubin,et al.  ‘Clarifying missing at random and related definitions, and implications when coupled with exchangeability’ , 2015 .

[31]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[32]  Donald B. Rubin,et al.  Characterizing the Estimation of Parameters in Incomplete-Data Problems , 1974 .

[33]  S. van Buuren,et al.  Multivariate Imputation by Chained Equations : Mice V1.0 User's manual , 2000 .

[34]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[35]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[36]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[37]  S. Cole,et al.  Estimating the Relative Excess Risk Due to Interaction: A Bayesian Approach , 2011, Epidemiology.

[38]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[39]  D. Rubin Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .

[40]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[41]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[42]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[43]  S. Lipsitz,et al.  Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable , 2001 .

[44]  J. Robins,et al.  Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models , 2000 .

[45]  D. Pierce The Asymptotic Effect of Substituting Estimators for Parameters in Certain Types of Statistics , 1982 .

[46]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[47]  J. Wooldridge Inverse probability weighted estimation for general missing data problems , 2004 .

[48]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[49]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[50]  Geert Molenberghs,et al.  Analyzing incomplete longitudinal clinical trial data. , 2004, Biostatistics.

[51]  M. Kenward,et al.  Handbook of Missing Data Methodology , 2019 .

[52]  S. Lockman,et al.  Highly active antiretroviral therapy and adverse birth outcomes among HIV-infected women in Botswana. , 2012, The Journal of infectious diseases.

[53]  James M Robins,et al.  On weighting approaches for missing data , 2013, Statistical methods in medical research.

[54]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[55]  Stef van Buuren,et al.  Multivariate Imputation by Chained Equations , 2015 .

[56]  J. Robins,et al.  Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. , 1997, Statistics in medicine.

[57]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[58]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[59]  “ Multiple Imputation in Practice : Comparison of Software Packages for Regression Models With Missing Variables , ” , 2002 .

[60]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[61]  J. Schafer Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ , 2003 .

[62]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[63]  D. Rubin,et al.  Handling “Don't Know” Survey Responses: The Case of the Slovenian Plebiscite , 1995 .

[64]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[65]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[66]  John B. Carlin,et al.  Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values , 2010, Statistics in medicine.

[67]  Adrian F. M. Smith,et al.  Bayesian Analysis of Constrained Parameter and Truncated Data Problems , 1991 .