Accounting for interactions and complex inter‐subject dependency in estimating treatment effect in cluster‐randomized trials with missing outcomes

Semi-parametric methods are often used for the estimation of intervention effects on correlated outcomes in cluster-randomized trials (CRTs). When outcomes are missing at random (MAR), Inverse Probability Weighted (IPW) methods incorporating baseline covariates can be used to deal with informative missingness. Also, augmented generalized estimating equations (AUG) correct for imbalance in baseline covariates but need to be extended for MAR outcomes. However, in the presence of interactions between treatment and baseline covariates, neither method alone produces consistent estimates for the marginal treatment effect if the model for interaction is not correctly specified. We propose an AUG-IPW estimator that weights by the inverse of the probability of being a complete case and allows different outcome models in each intervention arm. This estimator is doubly robust (DR); it gives correct estimates whether the missing data process or the outcome model is correctly specified. We consider the problem of covariate interference which arises when the outcome of an individual may depend on covariates of other individuals. When interfering covariates are not modeled, the DR property prevents bias as long as covariate interference is not present simultaneously for the outcome and the missingness. An R package is developed implementing the proposed method. An extensive simulation study and an application to a CRT of HIV risk reduction-intervention in South Africa illustrate the method.

[1]  Andrea Rotnitzky,et al.  Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. , 2007, Biometrika.

[2]  Jonathan L. Blitstein,et al.  Design and analysis of group-randomized trials: a review of recent methodological developments. , 2004, American journal of public health.

[3]  Eric J Tchetgen Tchetgen,et al.  On Inverse Probability Weighting for Nonmonotone Missing at Random Data , 2014, Journal of the American Statistical Association.

[4]  Tom Oluoch,et al.  Factors Associated with HIV Infection in Married or Cohabitating Couples in Kenya: Results from a Nationally Representative Study , 2011, PloS one.

[5]  Marie Davidian,et al.  Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates , 2008, Biometrics.

[6]  S. Vansteelandt On Confounding, Prediction and Efficiency in the Analysis of Longitudinal and Cross‐sectional Clustered Data , 2007 .

[7]  Søren Højsgaard,et al.  The R Package geepack for Generalized Estimating Equations , 2005 .

[8]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[9]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[10]  B. Leroux,et al.  Informative Cluster Sizes for Subcluster‐Level Covariates and Weighted Generalized Estimating Equations , 2011, Biometrics.

[11]  I. White,et al.  Review of inverse probability weighting for dealing with missing data , 2013, Statistical methods in medical research.

[12]  Michael Hawkes,et al.  HIV and religion in the Congo: a mixed-methods study. , 2013, Current HIV research.

[13]  J. Pearl Causal inference in statistics: An overview , 2009 .

[14]  Eric J Tchetgen Tchetgen,et al.  Augmented generalized estimating equations for improving efficiency and validity of estimation in cluster randomized trials by leveraging cluster‐level and individual‐level covariates , 2012, Statistics in medicine.

[15]  M. J. van der Laan,et al.  Increasing Power in Randomized Trials with Right Censored Outcomes Through Covariate Adjustment , 2009, Journal of biopharmaceutical statistics.

[16]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[17]  James M. Robins,et al.  Marginal Structural Models versus Structural nested Models as Tools for Causal inference , 2000 .

[18]  T R Ten Have,et al.  An Empirical Comparison of Several Clustered Data Approaches Under Confounding Due to Cluster Effects in the Analysis of Complications of Coronary Angioplasty , 1999, Biometrics.

[19]  J. Robins,et al.  Specifying the correlation structure in inverse-probability- weighting estimation for repeated measures. , 2012, Epidemiology.

[20]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[21]  Myunghee C. Paik,et al.  The generalized estimating equation approach when data are not missing completely at random , 1997 .

[22]  A B Troxel,et al.  Weighted estimating equations with nonignorably missing response data. , 1997, Biometrics.

[23]  Elizabeth L. Ogburn,et al.  Causal diagrams for interference , 2014, 1403.1239.

[24]  Adam Glynn,et al.  An Introduction to the Augmented Inverse Propensity Weighted Estimator , 2010, Political Analysis.

[25]  M. Davidian,et al.  Covariate adjustment for two‐sample treatment comparisons in randomized clinical trials: A principled yet flexible approach , 2008, Statistics in medicine.

[26]  M. Hudgens,et al.  Toward Causal Inference With Interference , 2008, Journal of the American Statistical Association.

[27]  Zhulin He,et al.  Adjusting for confounding by cluster using generalized linear mixed models , 2010 .

[28]  James M Robins,et al.  On weighting approaches for missing data , 2013, Statistical methods in medical research.

[29]  P. Rosenbaum Interference Between Units in Randomized Experiments , 2007 .

[30]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[31]  Paul J Rathouz,et al.  Fast Pure R Implementation of GEE: Application of the Matrix Package , 2013, R J..

[32]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[33]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[34]  T. Louis,et al.  A Note on Marginal Linear Regression with Correlated Response Data , 2000 .

[35]  Anthonius Boer,et al.  Measuring balance and model selection in propensity score methods , 2011, Pharmacoepidemiology and drug safety.

[36]  Babette A Brumback,et al.  Adjusting for confounding by neighborhood using complex survey data , 2011, Statistics in medicine.

[37]  Ann O'Leary,et al.  Cluster-randomized controlled trial of an HIV/sexually transmitted infection risk-reduction intervention for South African men. , 2014, American journal of public health.

[38]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[39]  N. Jewell,et al.  To GEE or Not to GEE: Comparing Population Average and Mixed Models for Estimating the Associations Between Neighborhood Risk Factors and Health , 2010, Epidemiology.

[40]  Geert Molenberghs,et al.  A simulation study comparing weighted estimating equations with multiple imputation based estimating equations for longitudinal binary data , 2008, Comput. Stat. Data Anal..

[41]  Andreas Ziegler,et al.  Generalized Estimating Equations , 2011 .

[42]  M. Fay,et al.  Small‐Sample Adjustments for Wald‐Type Tests Using Sandwich Estimators , 2001, Biometrics.

[43]  M. Pepe,et al.  A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data , 1994 .

[44]  Michael G. Hudgens,et al.  Large Sample Randomization Inference of Causal Effects in the Presence of Interference , 2014, Journal of the American Statistical Association.

[45]  Andrew Copas,et al.  Review of methods for handling confounding by cluster and informative cluster size in clustered data , 2014, Statistics in medicine.

[46]  Tyler J VanderWeele,et al.  On causal inference in the presence of interference , 2012, Statistical methods in medical research.

[47]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .