DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training

We study optimal covariate balance for causal inferences from observational data when rich covariates and complex relationships necessitate flexible modeling with neural networks. Standard approaches such as propensity weighting and matching/balancing fail in such settings due to miscalibrated propensity nets and inappropriate covariate representations, respectively. We propose a new method based on adversarial training of a weighting and a discriminator network that effectively addresses this methodological gap. This is demonstrated through new theoretical characterizations of the method as well as empirical results using both fully connected architectures to learn complex relationships and convolutional architectures to handle image confounders, showing how this new method can enable strong causal analyses in these challenging settings.

[1]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[2]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[3]  Nathan Kallus Optimal a priori balance in the design of controlled experiments , 2013, 1312.0531.

[4]  Dimitris Bertsimas,et al.  The Power of Optimization Over Randomization in Designing Experiments Involving Small Samples , 2015, Oper. Res..

[5]  Nathan Kallus,et al.  Generalized Optimal Matching Methods for Causal Inference , 2016, J. Mach. Learn. Res..

[6]  J. Zubizarreta Journal of the American Statistical Association Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery , 2022 .

[7]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[8]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[9]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[10]  Sören R. Künzel,et al.  Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning , 2017 .

[11]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[12]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[13]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[14]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Paul R. Rosenbaum,et al.  Optimal Matching for Observational Studies , 1989 .

[18]  Xinkun Nie,et al.  Learning Objectives for Treatment Effect Estimation , 2017 .

[19]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[20]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[21]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[22]  Kevin Leyton-Brown,et al.  Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.

[23]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[24]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[25]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[26]  Nathan Kallus,et al.  A Framework for Optimal Matching for Causal Inference , 2016, AISTATS.

[27]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[28]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[29]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[30]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[31]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[32]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[33]  Peter L. Bartlett,et al.  Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..

[34]  Sören R. Künzel,et al.  Metalearners for estimating heterogeneous treatment effects using machine learning , 2017, Proceedings of the National Academy of Sciences.

[35]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[36]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[37]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[38]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[39]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[40]  G. King,et al.  Causal Inference without Balance Checking: Coarsened Exact Matching , 2012, Political Analysis.

[41]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[42]  Ohad Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[43]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.