Balancing Out Regression Error: Efficient Treatment Effect Estimation without Smooth Propensities

There has been a recent surge of interest in doubly robust approaches to treatment effect estimation in observational studies, driven by a realization that they can be combined with modern machine learning methods to obtain estimators that pair good finite sample performance with asymptotic efficiency. These methods first fit a regularized regression model to the observed outcomes, and then use a weighted sum of residuals to debias it. Typically the debiasing weights are obtained by inverting a carefully tuned estimate of the propensity scores, and this choice can be justified by asymptotic arguments. However, there is no good reason to believe that an optimally tuned propensity model would also yield optimally tuned debiasing weights in finite samples. In this paper, we study an alternative approach to efficient treatment effect estimation based on using weights that directly optimize worst-case risk bounds; concretely, this amounts to selecting weights that uniformly balance out a class of functions known to capture the errors of the outcome regression with high probability. We provide general conditions under which our method achieves the semiparametric efficiency bound; in particular, unlike existing methods, we do not assume any regularity on the treatment propensities beyond overlap. In extensive experiments, we find that our method, weighting for uniform balance, compares favorably to augmented inverse-propensity weighting and targeted maximum likelihood estimation.

[1]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[2]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[3]  S. Lang Real and Functional Analysis , 1983 .

[4]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[5]  I. Ibragimov,et al.  On Nonparametric Estimation of the Value of a Linear Functional in Gaussian White Noise , 1985 .

[6]  A. Schick On Asymptotically Efficient Estimation in Semiparametric Models , 1986 .

[7]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[8]  D. Donoho,et al.  Geometrizing Rates of Convergence, III , 1991 .

[9]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[10]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[11]  W. Newey,et al.  The asymptotic variance of semiparametric estimators , 1994 .

[12]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[13]  D. Donoho Statistical Estimation and Optimal Recovery , 1994 .

[14]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Joseph T. Chang,et al.  Conditioning as disintegration , 1997 .

[17]  Donald B. Rubin,et al.  Combining Panel Data Sets with Attrition and Refreshment Samples , 1998 .

[18]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[19]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[20]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[21]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[22]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[23]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[24]  A. Juditsky,et al.  Nonparametric estimation by convex programming , 2009, 0908.3108.

[25]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[26]  Bryan S. Graham,et al.  Efficient Estimation of Data Combination Models by the Method of Auxiliary-to-Study Tilting (AST) , 2011 .

[27]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[28]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[29]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[30]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[31]  J. Peypouquet Convex Optimization in Normed Spaces: Theory, Methods and Examples , 2015 .

[32]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[33]  Timothy B. Armstrong,et al.  Optimal Inference in a Class of Regression Models , 2015, 1511.06028.

[34]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2015 .

[35]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.

[36]  S. Mendelson,et al.  Regularization and the small-ball method I: sparse recovery , 2016, 1601.05584.

[37]  G. Imbens,et al.  Efficient Inference of Average Treatment Effects in High Dimensions via Approximate Residual Balancing , 2016 .

[38]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[39]  James M. Robins,et al.  Semiparametric efficient empirical higher order influence function estimators , 2017, 1705.07577.

[40]  Whitney K. Newey,et al.  Cross-fitting and fast remainder rates for semiparametric estimation , 2017, 1801.09138.

[41]  Stefan Wager,et al.  Efficient Policy Learning , 2017, ArXiv.

[42]  Shahar Mendelson,et al.  Regularization and the small-ball method II: complexity dependent error rates , 2016, J. Mach. Learn. Res..

[43]  Xinkun Nie,et al.  Learning Objectives for Treatment Effect Estimation , 2017 .

[44]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[45]  Double/De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers , 2018, 1802.08667.

[46]  Yixin Wang,et al.  Minimal dispersion approximately balancing weights: asymptotic properties and practical considerations , 2017, Biometrika.

[47]  Stefan Wager,et al.  Optimized Regression Discontinuity Designs , 2017, Review of Economics and Statistics.

[48]  Qingyuan Zhao Covariate balancing propensity score by tailored loss functions , 2016, The Annals of Statistics.

[49]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[50]  Dylan S. Small,et al.  Selective inference for effect modification via the lasso , 2017, Journal of the Royal Statistical Society. Series B, Statistical methodology.