The Counterfactual $\chi$-GAN

Causal inference often relies on the counterfactual framework, which requires that treatment assignment is independent of the outcome, known as strong ignorability. Approaches to enforcing strong ignorability in causal analyses of observational data include weighting and matching methods. Effect estimates, such as the average treatment effect (ATE), are then estimated as expectations under the reweighted or matched distribution, P . The choice of P is important and can impact the interpretation of the effect estimate and the variance of effect estimates. In this work, instead of specifying P, we learn a distribution that simultaneously maximizes coverage and minimizes variance of ATE estimates. In order to learn this distribution, this research proposes a generative adversarial network (GAN)-based model called the Counterfactual $\chi$-GAN (cGAN), which also learns feature-balancing weights and supports unbiased causal estimation in the absence of unobserved confounding. Our model minimizes the Pearson $\chi^2$ divergence, which we show simultaneously maximizes coverage and minimizes the variance of importance sampling estimates. To our knowledge, this is the first such application of the Pearson $\chi^2$ divergence. We demonstrate the effectiveness of cGAN in achieving feature balance relative to established weighting methods in simulation and with real-world medical data.

[1]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[2]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[3]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[4]  Anthony D. Ong,et al.  A Primer on Inverse Probability of Treatment Weighting and Marginal Structural Models , 2016 .

[5]  Yue Shentu,et al.  Efficacy and Tolerability of Sitagliptin Compared with Glimepiride in Elderly Patients with Type 2 Diabetes Mellitus and Inadequate Glycemic Control: A Randomized, Double-Blind, Non-Inferiority Trial , 2015, Drugs & Aging.

[6]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[7]  Guido W. Imbens,et al.  Imposing Moment Restrictions from Auxiliary Data by Weighting , 1996, Review of Economics and Statistics.

[8]  Nathan Kallus,et al.  A Framework for Optimal Matching for Causal Inference , 2016, AISTATS.

[9]  C. Glymour,et al.  STATISTICS AND CAUSAL INFERENCE , 1985 .

[10]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[11]  C. Hazlett,et al.  Kernel Balancing: A Flexible Non-Parametric Weighting Procedure for Estimating Causal Effects , 2016, 1605.00155.

[12]  Jianfeng Feng,et al.  Chi-square Generative Adversarial Network , 2018, ICML.

[13]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[14]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[15]  Kari Lock Morgan,et al.  Balancing Covariates via Propensity Score Weighting , 2014, 1609.07494.

[16]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[17]  Nathan Kallus Optimal a priori balance in the design of controlled experiments , 2013, 1312.0531.

[18]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[19]  Stephen R Cole,et al.  Constructing inverse probability weights for marginal structural models. , 2008, American journal of epidemiology.

[20]  D. Rubin Matched Sampling for Causal Effects: The Use of Matched Sampling and Regression Adjustment to Remove Bias in Observational Studies , 1973 .

[21]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[22]  Walter Karlen,et al.  Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks , 2018, ArXiv.

[23]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[24]  Nathan Kallus,et al.  DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training , 2018, ICML.

[25]  Luke Keele,et al.  Optimal Multilevel Matching in Clustered Observational Studies: A Case Study of the School Voucher System in Chile , 2014 .

[26]  K. C. G. Chan,et al.  Globally efficient non‐parametric inference of average treatment effects by empirical balancing calibration weighting , 2016, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[27]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[28]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[29]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[30]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[31]  Luke Keele,et al.  Optimal Multilevel Matching in Clustered Observational Studies: A Case Study of the Effectiveness of Private Schools Under a Large-Scale Voucher System , 2014, 1409.8597.

[32]  Dustin Tran,et al.  Variational Inference via \chi Upper Bound Minimization , 2016, NIPS.

[33]  Chad Hazlett,et al.  Covariate balancing propensity score for a continuous treatment: Application to the efficacy of political advertisements , 2018 .

[34]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[35]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[37]  D. Rubin,et al.  Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome , 1983 .

[38]  Nathan Kallus,et al.  Causal Inference by Minimizing the Dual Norm of Bias: Kernel Matching & Weighting Estimators for Causal Effects , 2016, CFA@UAI.

[39]  T. Shakespeare,et al.  Observational Studies , 2003 .

[40]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[41]  D. McCaffrey,et al.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.

[42]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .