Double Robust Representation Learning for Counterfactual Prediction

Causal inference, or counterfactual prediction, is central to decision making in healthcare, policy and social sciences. To de-bias causal estimators with high-dimensional data in observational studies, recent advances suggest the importance of combining machine learning models for both the propensity score and the outcome function. We propose a novel scalable method to learn double-robust representations for counterfactual predictions, leading to consistent causal estimation if the model for either the propensity score or the outcome, but not necessarily both, is correctly specified. Specifically, we use the entropy balancing method to learn the weights that minimize the Jensen-Shannon divergence of the representation between the treated and control groups, based on which we make robust and efficient counterfactual predictions for both individual and average treatment effects. We provide theoretical justifications for the proposed method. The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.

[1]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[2]  Tyler J. VanderWeele,et al.  On the definition of a confounder , 2013, Annals of statistics.

[3]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[4]  Masatoshi Uehara,et al.  Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..

[5]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[6]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2013, 1309.4686.

[7]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[8]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[9]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[10]  Jeremy Ferwerda,et al.  Electoral consequences of declining participation: A natural experiment in Austria , 2014 .

[11]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[12]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[13]  Geert Ridder,et al.  Mean-Square-Error Calculations for Average Treatment Effects , 2005 .

[14]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[15]  L. Stefanski,et al.  The Calculus of M-Estimation , 2002 .

[16]  Miroslav Dudík,et al.  Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.

[17]  Yao Zhang,et al.  Learning Overlapping Representations for the Estimation of Individualized Treatment Effects , 2020, AISTATS.

[18]  Matthew Cefalu,et al.  Doubly robust matching estimators for high dimensional confounding adjustment , 2016, Biometrics.

[19]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[20]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[21]  Nan Jiang,et al.  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[22]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[23]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[24]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[25]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[26]  R. Kay The Analysis of Survival Data , 2012 .

[27]  Nathan Kallus,et al.  Balanced Policy Evaluation and Learning , 2017, NeurIPS.

[28]  Negar Hassanpour,et al.  CounterFactual Regression with Importance Sampling Weights , 2019, IJCAI.

[29]  Fredrik D. Johansson,et al.  Learning Weighted Representations for Generalization Across Designs , 2018, 1802.08598.

[30]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[31]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[32]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[33]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[34]  Yi Su,et al.  Doubly robust off-policy evaluation with shrinkage , 2019, ICML.

[35]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[36]  G. Imbens,et al.  Mean-Squared-Error Calculations for Average Treatment Effects , 2005 .

[37]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[38]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[39]  D. Rubin,et al.  Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in Observational Studies , 1978 .

[40]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[41]  Jan Marcus,et al.  The effect of unemployment on the mental health of spouses - evidence from plant closures in Germany. , 2013, Journal of health economics.

[42]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[43]  Jianfeng Feng,et al.  On Fenchel Mini-Max Learning , 2019, NeurIPS.

[44]  J. Barkley Rosser,et al.  ON THE FOUNDATIONS OF MATHEMATICAL ECONOMICS , 2012 .

[45]  Nathan Kallus,et al.  Generalized Optimal Matching Methods for Causal Inference , 2016, J. Mach. Learn. Res..

[46]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[47]  Leon Hirsch,et al.  Fundamentals Of Convex Analysis , 2016 .

[48]  Kari Lock Morgan,et al.  Balancing Covariates via Propensity Score Weighting , 2014, 1609.07494.