Learning Objectives for Treatment Effect Estimation

We develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. We first estimate marginal effects and treatment propensities to form an objective function that isolates the heterogeneous treatment effects, and then optimize the learned objective. This approach has several advantages over existing methods. From a practical perspective, our method is very flexible and easy to use: In both steps, we can use any method of our choice, e.g., penalized regression, a deep net, or boosting; moreover, these methods can be fine-tuned by cross-validating on the learned objective. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property, whereby even if our pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same regret bounds as an oracle who has a-priori knowledge of these nuisance components. We implement variants of our method based on both penalized regression and convolutional neural networks, and find promising performance relative to existing baselines.

[1]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[2]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[3]  A. Schick On Asymptotically Efficient Estimation in Semiparametric Models , 1986 .

[4]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[5]  J. Friedman Multivariate adaptive regression splines , 1990 .

[6]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[7]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[8]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[9]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[10]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[13]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[14]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[15]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[16]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[17]  C. Manski Statistical treatment rules for heterogeneous populations , 2003 .

[18]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[19]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[20]  Susan A. Murphy,et al.  A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[21]  D. Green,et al.  Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment , 2006, Political Analysis.

[22]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[23]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[24]  K. Hirano,et al.  Asymptotics for Statistical Treatment Rules , 2009 .

[25]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[26]  Yuhong Yang CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES , 2007, 0803.2963.

[27]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[28]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[29]  Xiaogang Su,et al.  Subgroup Analysis via Recursive Partitioning , 2009 .

[30]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[31]  S. Mendelson,et al.  Regularization in kernel learning , 2010, 1001.2094.

[32]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[33]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[34]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[35]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[36]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[37]  Lu Tian,et al.  A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates , 2012, 1212.2995.

[38]  Toru Kitagawa,et al.  Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice , 2015 .

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[41]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[42]  Stefan Wager,et al.  High-dimensional regression adjustments in randomized experiments , 2016, Proceedings of the National Academy of Sciences.

[43]  Susan Athey,et al.  Beyond prediction: Using big data for policy problems , 2017, Science.

[44]  P. Richard Hahn,et al.  Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects , 2017, 1706.09523.

[45]  Stefan Wager,et al.  Efficient Policy Learning , 2017, ArXiv.

[46]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[47]  Trevor Hastie,et al.  Some methods for heterogeneous treatment effect estimation in high dimensions , 2017, Statistics in medicine.

[48]  Jennifer Hill,et al.  Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition , 2017, Statistical Science.

[49]  Sören R. Künzel,et al.  Metalearners for estimating heterogeneous treatment effects using machine learning , 2017, Proceedings of the National Academy of Sciences.

[50]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[51]  Dylan S. Small,et al.  Selective inference for effect modification via the lasso , 2017, Journal of the Royal Statistical Society. Series B, Statistical methodology.