Combining observational and experimental data to find heterogeneous treatment effects

Every design choice will have different effects on different units. However traditional A/B tests are often underpowered to identify these heterogeneous effects. This is especially true when the set of unit-level attributes is high-dimensional and our priors are weak about which particular covariates are important. However, there are often observational data sets available that are orders of magnitude larger. We propose a method to combine these two data sources to estimate heterogeneous treatment effects. First, we use observational time series data to estimate a mapping from covariates to unit-level effects. These estimates are likely biased but under some conditions the bias preserves unit-level relative rank orderings. If these conditions hold, we only need sufficient experimental data to identify a monotonic, one-dimensional transformation from observationally predicted treatment effects to real treatment effects. This reduces power demands greatly and makes the detection of heterogeneous effects much easier. As an application, we show how our method can be used to improve Facebook page recommendations.

[1]  Matt Taddy,et al.  Heterogeneous Treatment Effects in Digital Experimentation , 2014, 1412.8563.

[2]  Uri Shalit,et al.  Bounding and Minimizing Counterfactual Error , 2016, ArXiv.

[3]  D. Green,et al.  Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees , 2012 .

[4]  Illtyd Trethowan Causality , 1938 .

[5]  Justin Grimmer,et al.  Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods , 2017, Political Analysis.

[6]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[7]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[8]  Richard K. Crump,et al.  Nonparametric Tests for Treatment Effect Heterogeneity , 2006, The Review of Economics and Statistics.

[9]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[10]  A. Banerjee,et al.  Thinking Small: A Review of Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty , 2016 .

[11]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[12]  Hansheng Wang,et al.  Subgroup Analysis via Recursive Partitioning , 2009, J. Mach. Learn. Res..

[13]  Michael S. Bernstein,et al.  Designing and deploying online field experiments , 2014, WWW.

[14]  P. Lachenbruch,et al.  Design Sensitivity: Statistical Power for Experimental Research. , 1989 .

[15]  F. Sheikh Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty , 2017 .

[16]  Joaquin Quiñonero Candela,et al.  Practical Lessons from Predicting Clicks on Ads at Facebook , 2014, ADKDD'14.

[17]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[18]  M. Meyer,et al.  Two Cheers for Corporate Experimentation: The A/B Illusion and the Virtues of Data-Driven Innovation , 2015 .

[19]  Anmol Bhasin,et al.  From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks , 2015, KDD.

[20]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[21]  Ron Kohavi,et al.  Practical guide to controlled experiments on the web: listen to your customers not to the hippo , 2007, KDD '07.

[22]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[23]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[24]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[25]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[26]  James Bennett,et al.  The Netflix Prize , 2007 .

[27]  Mark W. Lipsey,et al.  Design Sensitivity: Statistical Power for Experimental Research. , 1989 .

[28]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[29]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[30]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[31]  Aiyou Chen,et al.  Data enriched linear regression , 2013, 1304.1837.

[32]  M. Gail,et al.  Testing for qualitative interactions between treatment effects and patient subsets. , 1985, Biometrics.