Efficient Inference of Average Treatment Effects in High Dimensions via Approximate Residual Balancing

There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment assignment be as good as random conditional on pre-treatment variables. The unconfoundedness assumption is often more plausible if a large number of pre-treatment variables are included in the analysis, but this can worsen the finite sample properties of standard approaches to treatment effect estimation. There are some recent proposals on how to extend classical methods to the high dimensional setting; however, to our knowledge, all existing method rely on consistent estimability of the propensity score, i.e., the probability of receiving treatment given pre-treatment variables. In this paper, we propose a new method for estimating average treatment effects in high dimensional linear settings that attains dimension-free rates of convergence for estimating average treatment effects under substantially weaker assumptions than existing methods: Instead of requiring the propensity score to be estimable, we only require overlap, i.e., that the propensity score be uniformly bounded away from 0 and 1. Procedurally, out method combines balancing weights with a regularized regression adjustment.

[1]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[2]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[3]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[4]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[5]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[6]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[7]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[8]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[9]  G. Imbens,et al.  Information Theoretic Approaches to Inference in Moment Condition Models , 1995 .

[10]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[13]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[14]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator , 1998 .

[15]  G. Imbens,et al.  Imposing Moment Restrictions by Weighting , 1999 .

[16]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[17]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[18]  Whitney K. Newey,et al.  Higher Order Properties of Gmm and Generalized Empirical Likelihood Estimators , 2003 .

[19]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[20]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[21]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[22]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[23]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[24]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[25]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[26]  J. Robins,et al.  Comment: Performance of Double-Robust Estimators When “Inverse Probability” Weights Are Highly Variable , 2007, 0804.2965.

[27]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[28]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[29]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[30]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[31]  Daniel Westreich,et al.  Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. , 2010, Journal of clinical epidemiology.

[32]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[33]  Zhiqiang Tan,et al.  Bounded, efficient and doubly robust estimation with inverse weighting , 2010 .

[34]  Bryan S. Graham,et al.  Efficient Estimation of Data Combination Models by the Method of Auxiliary-to-Study Tilting (AST) , 2011 .

[35]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[36]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[37]  A. Belloni,et al.  Program evaluation with high-dimensional data , 2013 .

[38]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[39]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[40]  R. Tibshirani,et al.  A Study of Error Variance Estimation in Lasso Regression , 2013, 1311.5274.

[41]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .

[42]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[43]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[44]  Han Liu,et al.  A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models , 2014, 1412.8765.

[45]  Emmanuel J. Candès,et al.  SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax , 2015, ArXiv.

[46]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[47]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[48]  A. Tsybakov,et al.  Minimax estimation of linear and quadratic functionals on sparsity classes , 2015, 1502.00665.

[49]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[50]  Qingyuan Zhao,et al.  Double Robustness for Causal Effects via Entropy Balancing , 2015 .

[51]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: A General Method for Estimating Sampling Variances for Standard Estimators for Average Causal Effects , 2015 .

[52]  Jafar Jafarov,et al.  Prediction error of cross-validated Lasso , 2015, 1502.06291.

[53]  T. Tony Cai,et al.  Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity , 2015, 1506.05539.

[54]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2015 .

[55]  Stefan Wager,et al.  High-dimensional regression adjustments in randomized experiments , 2016, Proceedings of the National Academy of Sciences.

[56]  Cun-Hui Zhang,et al.  Lasso adjustments of treatment effect estimates in randomized experiments , 2015, Proceedings of the National Academy of Sciences.

[57]  David A. Hirshberg,et al.  Balancing Out Regression Error: Efficient Treatment Effect Estimation without Smooth Propensities , 2017 .

[58]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Causal Parameters , 2016, 1608.00060.

[59]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[60]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[61]  Yinchu Zhu,et al.  Linear Hypothesis Testing in Dense High-Dimensional Linear Models , 2016, Journal of the American Statistical Association.

[62]  Qingyuan Zhao Covariate balancing propensity score by tailored loss functions , 2016, The Annals of Statistics.