The LOOP Estimator: Adjusting for Covariates in Randomized Experiments

Background: When conducting a randomized controlled trial, it is common to specify in advance the statistical analyses that will be used to analyze the data. Typically, these analyses will involve adjusting for small imbalances in baseline covariates. However, this poses a dilemma, as adjusting for too many covariates can hurt precision more than it helps, and it is often unclear which covariates are predictive of outcome prior to conducting the experiment. Objectives: This article aims to produce a covariate adjustment method that allows for automatic variable selection, so that practitioners need not commit to any specific set of covariates prior to seeing the data. Results: In this article, we propose the “leave-one-out potential outcomes” estimator. We leave out each observation and then impute that observation’s treatment and control potential outcomes using a prediction algorithm such as a random forest. In addition to allowing for automatic variable selection, this estimator is unbiased under the Neyman–Rubin model, generally performs at least as well as the unadjusted estimator, and the experimental randomization largely justifies the statistical assumptions made.

[1]  David Moher,et al.  CONSORT 2010 Statement: Updated Guidelines for Reporting Parallel Group Randomised Trials , 2010, PLoS medicine.

[2]  A. Buja,et al.  Covariance Adjustments for the Analysis of Randomized Field Experiments , 2013, Evaluation review.

[3]  M J van der Laan,et al.  Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation , 2009, Statistics in medicine.

[4]  Joel A. Middleton,et al.  A Class of Unbiased Estimators of the Average Treatment Effect in Randomized Experiments , 2013 .

[5]  James M. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models: Rejoinder , 1999 .

[6]  Diana C. Mutz,et al.  The Perils of Balance Testing in Experimental Design: Messy Analyses of Clean Data , 2019 .

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[9]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[10]  Jiannan Lu Covariate adjustment in randomization-based causal inference for 2K factorial designs , 2016, 1606.05418.

[11]  Stefan Wager,et al.  High-dimensional regression adjustments in randomized experiments , 2016, Proceedings of the National Academy of Sciences.

[12]  W. Lin,et al.  Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique , 2012, 1208.2301.

[13]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[14]  I. Olkin,et al.  Improving the quality of reporting of randomized controlled trials. The CONSORT statement. , 1996, JAMA.

[15]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[16]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[17]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[18]  N. Jewell,et al.  Covariate adjustment. , 1991, Biometrics.

[19]  D. Green,et al.  The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment , 2000, American Political Science Review.

[20]  W. H. Williams Generating Unbiased Ratio and Regression Estimators , 1961 .

[21]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[22]  David A. Freedman,et al.  On regression adjustments to experimental data , 2008, Adv. Appl. Math..

[23]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[24]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[25]  G. Koch,et al.  A review of some statistical methods for covariance analysis of categorical data. , 1982, Biometrics.

[26]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[27]  Donald K. K. Lee,et al.  SHARP BOUNDS ON THE VARIANCE IN RANDOMIZED EXPERIMENTS , 2014, 1405.6555.

[28]  D. Moher,et al.  CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials , 2010, BMC medicine.

[29]  Jann Spiess,et al.  Optimal Estimation when Researcher and Social Preferences are Misaligned , 2017 .

[30]  Alwyn Young Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results* , 2018, The Quarterly Journal of Economics.

[31]  P. Rosenbaum Covariance Adjustment in Randomized Experiments and Observational Studies , 2002 .

[32]  T. Speed,et al.  On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9 , 1990 .

[33]  M. Rosenblum,et al.  Improving precision by adjusting for prognostic baseline variables in randomized trials with binary outcomes, without regression model assumptions. , 2017, Contemporary clinical trials.

[34]  D. Moher,et al.  CONSORT 2010 Statement: updated guidelines for reporting parallel group randomized trials , 2010, Obstetrics and gynecology.

[35]  Cun-Hui Zhang,et al.  Lasso adjustments of treatment effect estimates in randomized experiments , 2015, Proceedings of the National Academy of Sciences.

[36]  Xinkun Nie,et al.  Learning Objectives for Treatment Effect Estimation , 2017 .

[37]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[38]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[39]  Luke W. Miratrix,et al.  Adjusting treatment effect estimates by post‐stratification in randomized experiments , 2013 .

[40]  G G Koch,et al.  Issues for covariance analysis of dichotomous and ordered categorical data from randomized clinical trials and non-parametric strategies for addressing them. , 1998, Statistics in medicine.

[41]  Leigh L. Linden,et al.  Improving the Design of Conditional Transfer Programs: Evidence from a Randomized Education Experiment in Colombia † , 2011 .

[42]  M. Davidian,et al.  Covariate adjustment for two‐sample treatment comparisons in randomized clinical trials: A principled yet flexible approach , 2008, Statistics in medicine.