Debiased machine learning of conditional average treatment effects and other causal functions

This paper provides estimation and inference methods for the best linear predictor (approximation) of a structural function, such as conditional average structural and treatment effects, and structural derivatives, based on modern machine learning (ML) tools. We represent this structural function as a conditional expectation of an unbiased signal that depends on a nuisance parameter, which we estimate by modern machine learning techniques. We first adjust the signal to make it insensitive (Neyman-orthogonal) with respect to the first-stage regularization bias. We then project the signal onto a set of basis functions, growing with sample size, which gives us the best linear predictor of the structural function. We derive a complete set of results for estimation and simultaneous inference on all parameters of the best linear predictor, conducting inference by Gaussian bootstrap. When the structural function is smooth and the basis is sufficiently rich, our estimation and inference result automatically targets this function. When basis functions are group indicators, the best linear predictor reduces to group average treatment/structural effect, and our inference automatically targets these parameters. We demonstrate our method by estimating uniform confidence bands for the average price elasticity of gasoline demand conditional on income.

[1]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[2]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[3]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[4]  Edward H Kennedy,et al.  Non‐parametric methods for doubly robust estimation of continuous treatment effects , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[5]  Johannes Schmidt-Hieber,et al.  Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[6]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[7]  Justin Grimmer,et al.  Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods , 2017, Political Analysis.

[8]  Robert P. Lieli,et al.  Estimation of Conditional Average Treatment Effects With High-Dimensional Data , 2019, Journal of Business & Economic Statistics.

[9]  Robert P. Lieli,et al.  Estimating Conditional Average Treatment Effects , 2014 .

[10]  Prem S. Puri,et al.  On Optimal Asymptotic Tests of Composite Statistical Hypotheses , 1967 .

[11]  Michael Lechner,et al.  Nonparametric estimation of causal heterogeneity under high-dimensional confounding , 2019, 1908.08779.

[12]  Esther Duflo,et al.  Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments , 2017 .

[13]  Bryan S. Graham,et al.  Efficiency Bounds for Missing Data Models with Semiparametric Restrictions , 2008 .

[14]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[15]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[16]  Victor Chernozhukov,et al.  Conditional quantile processes based on series or many regressors , 2019, Journal of Econometrics.

[17]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[18]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[19]  Roger Koenker,et al.  Adaptive $L$-Estimation for Linear Models , 1989 .

[20]  G. Imbens The Role of the Propensity Score in Estimating Dose-Response Functions , 1999 .

[21]  Christian Hansen,et al.  High-Dimensional Methods and Inference on Structural and Treatment Effects , 2013 .

[22]  Kengo Kato,et al.  Some new asymptotic theory for least squares series: Pointwise and uniform results , 2012, 1212.0442.

[23]  Whitney K. Newey,et al.  Two-Step Series Estimation of Sample Selection Models , 2009 .

[24]  Victor Chernozhukov,et al.  Post-Selection Inference for Generalized Linear Models With Many Controls , 2013, 1304.3969.

[25]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[26]  Zhiwei Steven Wu,et al.  Orthogonal Random Forest for Causal Inference , 2018, ICML.

[27]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[28]  Martin Spindler,et al.  High-Dimensional $L_2$Boosting: Rate of Convergence , 2016, 1602.08927.

[29]  Ying-Ying Lee,et al.  Double debiased machine learning nonparametric inference with continuous treatments , 2019, 2004.03036.

[30]  James M. Robins,et al.  Causal inference for complex longitudinal data: the continuous case , 2001 .

[31]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[32]  Xiaohong Chen,et al.  Optimal Uniform Convergence Rates and Asymptotic Normality for Series Estimators Under Weak Dependence and Weak Conditions , 2014, 1412.6020.

[33]  Stefan Wager,et al.  Adaptive Concentration of Regression Trees, with Application to Random Forests , 2015 .

[34]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[35]  Sokbae Lee,et al.  Intersection bounds: estimation and inference , 2009 .

[36]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[37]  W. Newey,et al.  Nonparametric Estimation of Exact Consumers Surplus and Deadweight Loss by , 2009 .

[38]  Adonis Yatchew,et al.  Household Gasoline Demand in Canada , 2001 .

[39]  Sanjog Misra,et al.  Deep Neural Networks for Estimation and Inference , 2018, Econometrica.

[40]  B. Graham,et al.  Inverse Probability Tilting for Moment Condition Models with Missing Data , 2008 .

[41]  J. Horowitz,et al.  Measuring the price responsiveness of gasoline demand: Economic shape restrictions and nonparametric demand estimation: Price responsiveness of gasoline demand , 2012 .

[42]  W. Newey,et al.  Convergence rates and asymptotic normality for series estimators , 1997 .

[43]  Richard Schmalensee,et al.  Household Gasoline Demand in the United States , 1999 .

[44]  Daniel Jacob Group Average Treatment Effects for Observational Studies , 2019 .

[45]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.