A Framework for the Meta-Analysis of Randomized Experiments with Applications to Heavy-Tailed Response Data

A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance. In this paper, we provide a novel cross-validation-like methodology to address this challenge. The key insight of our procedure is that the noisy (but unbiased) difference-of-means estimate can be used as a ground truth “label” on a portion of the RCT, to test the performance of an estimator trained on the other portion. We combine this insight with an aggregation scheme, which borrows statistical strength across a large collection of RCTs, to present an end-to-end methodology for judging an estimator’s ability to recover the underlying treatment effect. We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain. In the corpus of AB tests at Amazon, we highlight the unique difficulties associated with recovering the treatment effect due to the heavy-tailed nature of the response variables. In this heavy-tailed setting, our methodology suggests that procedures that aggressively downweight or truncate large values, while introducing bias, lower the variance enough to ensure that the treatment effect is more accurately estimated.

[1]  Matt Taddy,et al.  Scalable Semiparametric Inference for the Means of Heavy-tailed Distributions , 2016, Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part B.

[2]  Trevor Hastie,et al.  Some methods for heterogeneous treatment effect estimation in high dimensions , 2017, Statistics in medicine.

[3]  Martin J. Wainwright,et al.  Simple, Robust and Optimal Ranking from Pairwise Comparisons , 2015, J. Mach. Learn. Res..

[4]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[5]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[6]  Michael I. Jordan,et al.  Optimal Mean Estimation without a Variance , 2020, COLT.

[7]  A. Young Mostly Harmless Econometrics , 2012 .

[8]  Sidney I. Resnick,et al.  How to make a Hill Plot , 2000 .

[9]  Lester W. Mackey,et al.  Cross-validation Confidence Intervals for Test Error , 2020, NeurIPS.

[10]  D. Saari,et al.  The Copeland method , 1996 .

[11]  K. Athreya BOOTSTRAP OF THE MEAN IN THE INFINITE VARIANCE CASE , 1987 .

[12]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[13]  Robert Tibshirani,et al.  A comparison of methods for model selection when estimating individual treatment effects , 2018, 1804.05146.

[14]  Stefan Wager,et al.  Semiparametric Exponential Families for Heavy-Tailed Data , 2013, 1307.7830.

[15]  Jing Lei,et al.  Cross-Validation With Confidence , 2017, Journal of the American Statistical Association.