Approximate residual balancing: debiased inference of average treatment effects in high dimensions

There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment assignment be as good as random conditional on pretreatment variables. The unconfoundedness assumption is often more plausible if a large number of pretreatment variables are included in the analysis, but this can worsen the performance of standard approaches to treatment effect estimation. We develop a method for debiasing penalized regression adjustments to allow sparse regression methods like the lasso to be used for √n‐consistent inference of average treatment effects in high dimensional linear models. Given linearity, we do not need to assume that the treatment propensities are estimable, or that the average treatment effect is a sparse contrast of the outcome model parameters. Rather, in addition to standard assumptions used to make lasso regression on the outcome model consistent under 1‐norm error, we require only overlap, i.e. that the propensity score be uniformly bounded away from 0 and 1. Procedurally, our method combines balancing weights with a regularized regression adjustment.

[1]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[2]  C. Cassel,et al.  Some results on generalized difference estimation and generalized regression estimation for finite populations , 1976 .

[3]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[4]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[5]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[6]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[7]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[8]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[9]  G. Imbens,et al.  Information Theoretic Approaches to Inference in Moment Condition Models , 1995 .

[10]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  J. Robins,et al.  Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. , 1997, Statistics in medicine.

[13]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[14]  Donald B. Rubin,et al.  Combining Panel Data Sets with Attrition and Refreshment Samples , 1998 .

[15]  T. Shakespeare,et al.  Observational Studies , 2003 .

[16]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[17]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator , 1998 .

[18]  Peter J. Diggle,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models: Comment , 1999 .

[19]  G. Imbens,et al.  Imposing Moment Restrictions by Weighting , 1999 .

[20]  Guido W. Imbens,et al.  Imposing Moment Restrictions from Auxiliary Data by Weighting , 1996, Review of Economics and Statistics.

[21]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[22]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[23]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .

[24]  Whitney K. Newey,et al.  Higher Order Properties of Gmm and Generalized Empirical Likelihood Estimators , 2003 .

[25]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[26]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[27]  D. McCaffrey,et al.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.

[28]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[29]  Jacob Alex Klerman,et al.  NBER WORKING PAPER SERIES EVALUATING THE DIFFERENTIAL EFFECTS OF ALTERNATIVE WELFARE-TO-WORK TRAINING COMPONENTS: A RE-ANALYSIS OF THE CALIFORNIA GAIN PROGRAM , 2022 .

[30]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[31]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[32]  Marie Davidian,et al.  Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. , 2008, Statistical science : a review journal of the Institute of Mathematical Statistics.

[33]  Art B. Owen,et al.  Infinitely Imbalanced Logistic Regression , 2007, J. Mach. Learn. Res..

[34]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[35]  J. Robins,et al.  Comment: Performance of Double-Robust Estimators When “Inverse Probability” Weights Are Highly Variable , 2007, 0804.2965.

[36]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[37]  B. Graham,et al.  Inverse Probability Tilting for Moment Condition Models with Missing Data , 2008 .

[38]  Xiaohong Chen,et al.  Semiparametric efficiency in GMM models with auxiliary data , 2007, 0705.0069.

[39]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[40]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[41]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[42]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[43]  Richard K. Crump,et al.  Dealing with limited overlap in estimation of average treatment effects , 2009 .

[44]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[45]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[46]  Daniel Westreich,et al.  Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. , 2010, Journal of clinical epidemiology.

[47]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[48]  Zhiqiang Tan,et al.  Bounded, efficient and doubly robust estimation with inverse weighting , 2010 .

[49]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[50]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[51]  Bryan S. Graham,et al.  Efficient Estimation of Data Combination Models by the Method of Auxiliary-to-Study Tilting (AST) , 2011 .

[52]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[53]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[54]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[55]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[56]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2013, 1309.4686.

[57]  A. Belloni,et al.  Program evaluation with high-dimensional data , 2013 .

[58]  C. Rothe,et al.  Semiparametric Estimation and Inference Using Doubly Robust Moment Conditions , 2013, SSRN Electronic Journal.

[59]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[60]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[61]  R. Tibshirani,et al.  A Study of Error Variance Estimation in Lasso Regression , 2013, 1311.5274.

[62]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .

[63]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[64]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[65]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[66]  Han Liu,et al.  A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models , 2014, 1412.8765.

[67]  Emmanuel J. Candès,et al.  SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax , 2015, ArXiv.

[68]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[69]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[70]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[71]  Qingyuan Zhao,et al.  Double Robustness for Causal Effects via Entropy Balancing , 2015 .

[72]  Jafar Jafarov,et al.  Prediction error of cross-validated Lasso , 2015, 1502.06291.

[73]  T. Tony Cai,et al.  Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity , 2015, 1506.05539.

[74]  Stefan Wager,et al.  High-dimensional regression adjustments in randomized experiments , 2016, Proceedings of the National Academy of Sciences.

[75]  K. C. G. Chan,et al.  Globally efficient non‐parametric inference of average treatment effects by empirical balancing calibration weighting , 2016, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[76]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[77]  David A. Hirshberg,et al.  Balancing Out Regression Error: Efficient Treatment Effect Estimation without Smooth Propensities , 2017 .

[78]  James M. Robins,et al.  MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED , 2016 .

[79]  José R. Zubizarreta,et al.  Approximate Balancing Weights: Characterizations from a Shrinkage Estimation Perspective , 2017 .

[80]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[81]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[82]  Yinchu Zhu,et al.  Linear Hypothesis Testing in Dense High-Dimensional Linear Models , 2016, Journal of the American Statistical Association.