Two-Stage TMLE to reduce bias and improve efficiency in cluster randomized trials

Cluster randomized trials (CRTs) randomly assign an intervention to groups of individuals (e.g., clinics or communities) and measure outcomes on individuals in those groups. While offering many advantages, this experimental design introduces challenges that are only partially addressed by existing analytic approaches. First, outcomes are often missing for some individuals within clusters. Failing to appropriately adjust for differential outcome measurement can result in biased estimates and inference. Second, CRTs often randomize limited numbers of clusters, resulting in chance imbalances on baseline outcome predictors between arms. Failing to adaptively adjust for these imbalances and other predictive covariates can result in efficiency losses. To address these methodological gaps, we propose and evaluate a novel two-stage targeted minimum loss-based estimator to adjust for baseline covariates in a manner that optimizes precision, after controlling for baseline and postbaseline causes of missing outcomes. Finite sample simulations illustrate that our approach can nearly eliminate bias due to differential outcome measurement, while existing CRT estimators yield misleading results and inferences. Application to real data from the SEARCH community randomized trial demonstrates the gains in efficiency afforded through adaptive adjustment for baseline covariates, after controlling for missingness on individual-level outcomes.

[1]  M. Hudgens,et al.  Toward Causal Inference With Interference , 2008, Journal of the American Statistical Association.

[2]  Fan Li,et al.  Review of Recent Methodological Developments in Group-Randomized Trials: Part 1-Design. , 2017, American journal of public health.

[3]  M. Robins James,et al.  Estimation of the causal effects of time-varying exposures , 2008 .

[4]  Søren Højsgaard,et al.  The R Package geepack for Generalized Estimating Equations , 2005 .

[5]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[6]  David M Murray,et al.  Essential Ingredients and Innovations in the Design and Analysis of Group-Randomized Trials. , 2019, Annual review of public health.

[7]  S. Bremner,et al.  Increased risk of type I errors in cluster randomised trials with small or medium numbers of clusters: a review, reanalysis, and simulation study , 2016, Trials.

[8]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[9]  D. Scharfstein,et al.  Improving Precision and Power in Randomized Trials for COVID-19 Treatments Using Covariate Adjustment, for Binary, Ordinal, and Time-to-Event Outcomes , 2020, medRxiv.

[10]  P. Gilbert,et al.  Estimating and Testing Vaccine Sieve Effects Using Machine Learning , 2019, Journal of the American Statistical Association.

[11]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[12]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[13]  M. Petersen,et al.  Sustainable East Africa Research in Community Health (SEARCH): a community cluster randomized study of HIV "test and treat" using multi-disease approach in rural Uganda and Kenya , 2018, 1808.03231.

[14]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[15]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[16]  A. Legedza,et al.  An Overview of Practical Approaches for Handling Missing Data in Clinical Trials , 2009, Journal of biopharmaceutical statistics.

[17]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[18]  Mark J. van der Laan,et al.  The Highly Adaptive Lasso Estimator , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[19]  Michael Rosenblum,et al.  Leveraging prognostic baseline variables to gain precision in randomized trials , 2015, Statistics in medicine.

[20]  Iván Díaz,et al.  Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning. , 2019, Biostatistics.

[21]  Mark J van der Laan,et al.  Targeted Minimum Loss Based Estimator that Outperforms a given Estimator , 2012, The international journal of biostatistics.

[22]  P. Alam ‘W’ , 2021, Composites Engineering.

[23]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[24]  Donald B. Rubin,et al.  Comment : Neyman ( 1923 ) and Causal Inference in Experiments and Observational Studies , 2007 .

[25]  Eric J Tchetgen Tchetgen,et al.  Augmented generalized estimating equations for improving efficiency and validity of estimation in cluster randomized trials by leveraging cluster‐level and individual‐level covariates , 2012, Statistics in medicine.

[26]  M. J. van der Laan,et al.  Targeted Maximum Likelihood Estimation for Dynamic and Static Longitudinal Marginal Structural Working Models , 2014, Journal of causal inference.

[27]  Fan Li,et al.  Review of Recent Methodological Developments in Group-Randomized Trials: Part 2-Analysis , 2017, American journal of public health.

[28]  M. Petersen,et al.  A hybrid mobile HIV testing approach for population-wide HIV testing in rural East Africa: an observational study , 2016, The lancet. HIV.

[29]  Rui Wang,et al.  Accounting for interactions and complex inter‐subject dependency in estimating treatment effect in cluster‐randomized trials with missing outcomes , 2015, Biometrics.

[30]  Vinay Prasad,et al.  Characteristics of cluster randomized trials: are they living up to the randomized trial? , 2013, JAMA internal medicine.

[31]  Tony Blakely,et al.  Reflection on modern methods: when worlds collide-prediction, machine learning and causal inference. , 2019, International journal of epidemiology.

[32]  Jonathan W Bartlett,et al.  Missing binary outcomes under covariate‐dependent missingness in cluster randomised trials , 2016, Statistics in medicine.

[33]  N. Jewell,et al.  To GEE or Not to GEE: Comparing Population Average and Mixed Models for Estimating the Associations Between Neighborhood Risk Factors and Health , 2010, Epidemiology.

[34]  M. Halloran,et al.  Causal Inference in Infectious Diseases , 1995, Epidemiology.

[35]  Mark J van der Laan,et al.  A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure , 2017, Statistical methods in medical research.

[36]  Catherine M Crespi,et al.  Improved Designs for Cluster Randomized Trials. , 2016, Annual review of public health.

[37]  J Cornfield,et al.  Randomization by group: a formal analysis. , 1978, American journal of epidemiology.

[38]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[39]  Richard Platt,et al.  Cluster Randomized Trials in Comparative Effectiveness Research: Randomizing Hospitals to Test Methods for Prevention of Healthcare-Associated Infections , 2010, Medical care.

[40]  Sherri Rose,et al.  Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies , 2017, American journal of epidemiology.

[41]  Rui Wang,et al.  CRTgeeDR: an R Package for Doubly Robust Generalized Estimating Equations Estimations in Cluster Randomized Trials with Missing Data , 2017, R J..

[42]  G. Fitzgerald,et al.  'I. , 2019, Australian journal of primary health.

[43]  R J Carroll,et al.  On design considerations and randomization-based inference for community intervention trials. , 1996, Statistics in medicine.

[44]  Eyal Oren,et al.  Statistical analysis and handling of missing data in cluster randomized trials: a systematic review , 2016, Trials.

[45]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[46]  M E Halloran,et al.  Study designs for dependent happenings. , 1991, Epidemiology.

[47]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[48]  M. J. Laan,et al.  Adaptive pair‐matching in randomized trials with unbiased and efficient effect estimation , 2015, Statistics in medicine.

[49]  J. Bartlett,et al.  Missing continuous outcomes under covariate dependent missingness in cluster randomised trials , 2016, Statistical methods in medical research.

[50]  Mark J van der Laan,et al.  The International Journal of Biostatistics A Targeted Maximum Likelihood Estimator of a Causal Effect on a Bounded Continuous Outcome , 2011 .

[51]  Kristin E. Porter,et al.  Diagnosing and responding to violations in the positivity assumption , 2012, Statistical methods in medical research.

[52]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[53]  M. J. van der Laan,et al.  HIV Testing and Treatment with the Use of a Community Health Approach in Rural Africa. , 2019, The New England journal of medicine.

[54]  Laura B. Balzer,et al.  Targeted estimation and inference for the sample average treatment effect in trials with and without pair‐matching , 2016, Statistics in medicine.

[55]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[56]  Mark J. van der Laan,et al.  Cross-Validated Targeted Minimum-Loss-Based Estimation , 2011 .

[57]  Andrew Copas,et al.  Review of methods for handling confounding by cluster and informative cluster size in clustered data , 2014, Statistics in medicine.

[58]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[59]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[60]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[61]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[62]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[63]  M. J. van der Laan,et al.  Far from MCAR , 2019, Epidemiology.

[64]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[65]  Eric J Tchetgen Tchetgen,et al.  FLEXIBLE COVARIATE-ADJUSTED EXACT TESTS OF RANDOMIZED TREATMENT EFFECTS WITH APPLICATION TO A TRIAL OF HIV EDUCATION. , 2013, The annals of applied statistics.

[66]  M. Petersen,et al.  High rates of viral suppression in adults and children with high CD4+ counts using a streamlined ART delivery model in the SEARCH trial in rural Uganda and Kenya , 2017, Journal of the International AIDS Society.

[67]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[68]  M. Petersen,et al.  Leveraging Rapid Community-Based HIV Testing Campaigns for Non-Communicable Diseases in Rural Uganda , 2012, PloS one.

[69]  M J van der Laan,et al.  Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation , 2009, Statistics in medicine.

[70]  M. J. Laan,et al.  Comparative Methods for the Analysis of Cluster Randomized Trials , 2021 .

[71]  Jonathan L. Blitstein,et al.  Design and analysis of group-randomized trials in cancer: a review of current practices. , 2008, Journal of the National Cancer Institute.

[72]  Alisa J. Stephens,et al.  Locally Efficient Estimation of Marginal Treatment Effects When Outcomes Are Correlated: Is the Prize Worth the Chase? , 2014, The international journal of biostatistics.

[73]  Mark J van der Laan,et al.  Adaptive pre‐specification in randomized trials with and without pair‐matching , 2016, Statistics in medicine.

[74]  Michael Rosenblum,et al.  The International Journal of Biostatistics Simple , Efficient Estimators of Treatment Effects in Randomized Trials Using Generalized Linear Models to Leverage Baseline Variables , 2011 .