Machine Learning Methods for Estimating Heterogeneous Causal Eects

In this paper we study the problems of estimating heterogeneity in causal eects in experimental or observational studies and conducting inference about the magnitude of the dierences in treatment eects across subsets of the population. In applications, our method provides a data-driven approach to determine which subpopulations have large or small treatment eects and to test hypotheses about the dierences in these eects. For experiments, our method allows researchers to identify heterogeneity in treatment eects that was not specied in a pre-analysis plan, without concern about invalidating inference due to multiple testing. In most of the literature on supervised machine learning (e.g. regression trees, random forests, LASSO, etc.), the goal is to build a model of the relationship between a unit’s attributes and an observed outcome. A prominent role in these methods is played by cross-validation which compares predictions to actual outcomes in test samples, in order to select the level of complexity of the model that provides the best predictive power. Our method is closely related, but it diers in that it is tailored for predicting causal eects of a treatment rather than a unit’s outcome. The challenge is that the \ground truth" for a causal eect is not observed for any individual unit: we observe the unit with the treatment,

[1]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[2]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[3]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[4]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[5]  P. Holland Statistics and Causal Inference , 1985 .

[6]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[11]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Guido W. Imbens,et al.  EFFICIENT ESTIMATION OF AVERAGE TREATMENT EFFECTS , 2003 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[16]  Kurt Ulm,et al.  Responder identification in clinical trials with censored data , 2006, Comput. Stat. Data Anal..

[17]  Christopher Winship,et al.  Counterfactuals and Causal Inference: Methods and Principles for Social Research , 2007 .

[18]  Richard K. Crump,et al.  Nonparametric Tests for Treatment Effect Heterogeneity , 2006, The Review of Economics and Statistics.

[19]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[20]  P. Rosenbaum Design of Observational Studies , 2009, Springer Series in Statistics.

[21]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[22]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[23]  John Langford,et al.  The offset tree for learning with partial labels , 2008, KDD.

[24]  Xiaogang Su,et al.  Subgroup Analysis via Recursive Partitioning , 2009 .

[25]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[26]  J. Pearl,et al.  Causal Inference , 2011, Twenty-one Mental Models That Can Change Policing.

[27]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[28]  M. J. van der Laan,et al.  Optimizing randomized trial designs to distinguish which subpopulations benefit from treatment. , 2011, Biometrika.

[29]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[30]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[31]  Matt Taddy,et al.  Heterogeneous Treatment Effects in Digital Experimentation , 2014 .

[32]  Lu Tian,et al.  A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates , 2012, 1212.2995.

[33]  H. Weisberg,et al.  Post hoc subgroups in clinical trials: Anathema or analytics? , 2015, Clinical trials.

[34]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .