Inference Under Covariate-Adaptive Randomization

ABSTRACT This article studies inference for the average treatment effect in randomized controlled trials with covariate-adaptive randomization. Here, by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve “balance” within each stratum. Our main requirement is that the randomization scheme assigns treatment status within each stratum so that the fraction of units being assigned to treatment within each stratum has a well behaved distribution centered around a proportion π as the sample size tends to infinity. Such schemes include, for example, Efron’s biased-coin design and stratified block randomization. When testing the null hypothesis that the average treatment effect equals a prespecified value in such settings, we first show the usual two-sample t-test is conservative in the sense that it has limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the usual standard error of the two-sample t-test leads to a test that is exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. Next, we consider the usual t-test (on the coefficient on treatment assignment) in a linear regression of outcomes on treatment assignment and indicators for each of the strata. We show that this test is exact for the important special case of randomization schemes with , but is otherwise conservative. We again provide a simple adjustment to the standard errors that yields an exact test more generally. Finally, we study the behavior of a modified version of a permutation test, which we refer to as the covariate-adaptive permutation test, that only permutes treatment status for units within the same stratum. When applied to the usual two-sample t-statistic, we show that this test is exact for randomization schemes with and that additionally achieve what we refer to as “strong balance.” For randomization schemes with , this test may have limiting rejection probability under the null hypothesis strictly greater than the nominal level. When applied to a suitably adjusted version of the two-sample t-statistic, however, we show that this test is exact for all randomization schemes that achieve “strong balance,” including those with . A simulation study confirms the practical relevance of our theoretical results. We conclude with recommendations for empirical practice and an empirical illustration. Supplementary materials for this article are available online.

[1]  Soohyung Lee,et al.  Multiple Testing and Heterogeneous Treatment Effects: Re-Evaluating the Effect of PROGRESA on School Enrollment , 2013 .

[2]  Alessandro Baldi Antognini,et al.  A theoretical analysis of the power of biased coin designs , 2008 .

[3]  Jun Shao,et al.  A theory for testing hypotheses under covariate-adaptive randomization , 2010 .

[4]  J. Matthews,et al.  Randomization in Clinical Trials: Theory and Practice; , 2003 .

[5]  J. Heckman,et al.  Inference with Imperfect Randomization: The Case of the Perry Preschool Program , 2011, SSRN Electronic Journal.

[6]  A. Janssen,et al.  Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens-Fisher problem , 1997 .

[7]  Kari Lock Morgan,et al.  Rerandomization to improve covariate balance in experiments , 2012, 1207.5625.

[8]  Michal Kolesár,et al.  Robust Standard Errors in Small Samples: Some Practical Advice , 2012, Review of Economics and Statistics.

[9]  Miriam Bruhn,et al.  In Pursuit of Balance: Randomization in Practice in Development Field Experiments , 2008 .

[10]  A Hallstrom,et al.  Imbalance in treatment assignments in stratified blocked randomization. , 1988, Controlled clinical trials.

[11]  Inference under Covariate-Adaptive Randomization , 2015 .

[12]  Rebecca Dizon-Ross Parents’ perceptions and children’s education: Experimental evidence from Malawi , 2013 .

[13]  J. Wellner,et al.  Empirical Processes with Applications to Statistics , 2009 .

[14]  M. Torero,et al.  Iron Deficiency and Schooling Attainment in Peru , 2016 .

[15]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[16]  Esther Duflo,et al.  Education, HIV, and Early Fertility: Experimental Evidence from Kenya , 2014, The American economic review.

[17]  William F. Rosenberger,et al.  Randomization in Clinical Trials: Rosenberger/Randomization in Clinical Trials , 2016 .

[18]  Joseph P. Romano,et al.  EXACT AND ASYMPTOTICALLY ROBUST PERMUTATION TESTS , 2013, 1304.5939.

[19]  S. Pocock,et al.  Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. , 1975, Biometrics.

[20]  Dean S. Karlan,et al.  The Impact of Financial Education for Youth in Ghana , 2015 .

[21]  Donald B. Rubin,et al.  Asymptotic theory of rerandomization in treatment–control experiments , 2016, Proceedings of the National Academy of Sciences.

[22]  Rebecca Dizon-Ross Parents’ Beliefs About Their Children’s Academic Ability: Implications for Educational Investments , 2018, American Economic Review.

[23]  M Zelen,et al.  The randomization and stratification of patients to clinical trials. , 1974, Journal of chronic diseases.

[24]  W. Hoeffding The Large-Sample Power of Tests Based on Permutations of Observations , 1952 .

[25]  C. Viscoli,et al.  Stratified randomization for clinical trials. , 1999, Journal of clinical epidemiology.

[26]  Rebecca Dizon-Ross Parents&Apos; Beliefs About Their Children&Apos;S Academic Ability: Implications for Educational Investments , 2018 .

[27]  L. J. Wei,et al.  The Adaptive Biased Coin Design for Sequential Experiments , 1978 .

[28]  William F. Rosenberger,et al.  Exact properties of Efron’s biased coin randomization procedure , 2010, 1010.0483.

[29]  Verónica Frisancho The impact of financial education for youth , 2019, Economics of Education Review.

[30]  L. Christiaensen,et al.  Child Growth, Shocks, and Food Aid in Rural Ethiopia. World Bank Policy Research Working Paper. , 2003 .

[31]  P. Rosenbaum Interference Between Units in Randomized Experiments , 2007 .

[32]  Muhammad Yasir Khan,et al.  Personalities and Public Sector Performance: Evidence from a Health Experiment in Pakistan , 2015 .

[33]  Azeem M. Shaikh,et al.  Inference under Covariate-Adaptive Randomization with Multiple Treatments , 2018, Quantitative Economics.

[34]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[35]  Alwyn Young Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results* , 2018, The Quarterly Journal of Economics.

[36]  Soohyung Lee,et al.  MULTIPLE TESTING AND HETEROGENEOUS TREATMENT EFFECTS: RE-EVALUATING THE EFFECT OF PROGRESA ON SCHOOL ENROLLMENT: HETEROGENEOUS TREATMENT EFFECTS , 2014 .

[37]  H A Wendel,et al.  Randomization in clinical trials. , 1978, Science.

[38]  F. Hu,et al.  Asymptotic properties of covariate-adaptive randomization , 2012, 1210.4666.

[39]  B. Efron Forcing a sequential experiment to be balanced , 1971 .