Reliable Estimation of Individual Treatment Effect with Causal Information Bottleneck

Estimating individual level treatment effects (ITE) from observational data is a challenging and important area in causal machine learning and is commonly considered in diverse mission-critical applications. In this paper, we propose an information theoretic approach in order to find more reliable representations for estimating ITE. We leverage the Information Bottleneck (IB) principle, which addresses the trade-off between conciseness and predictive power of representation. With the introduction of an extended graphical model for causal information bottleneck, we encourage the independence between the learned representation and the treatment type. We also introduce an additional form of a regularizer from the perspective of understanding ITE in the semi-supervised learning framework to ensure more reliable representations. Experimental results show that our model achieves the state-of-the-art results and exhibits more reliable prediction performances with uncertainty information on real-world datasets.

[1]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[2]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[3]  Richard K. Crump,et al.  Nonparametric Tests for Treatment Effect Heterogeneity , 2006, The Review of Economics and Statistics.

[4]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[5]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[6]  Diederik P. Kingma,et al.  Stochastic Gradient VB and the Variational Auto-Encoder , 2013 .

[7]  Jennifer G. Dy,et al.  Informative Subspace Learning for Counterfactual Inference , 2017, AAAI.

[8]  Alexander A. Alemi,et al.  Uncertainty in the Variational Information Bottleneck , 2018, ArXiv.

[9]  Yun Fu,et al.  Matching via Dimensionality Reduction for Estimation of Treatment Effects in Digital Marketing Campaigns , 2016, IJCAI.

[10]  Aidong Zhang,et al.  Representation Learning for Treatment Effect Estimation from Observational Data , 2018, NeurIPS.

[11]  Truyen Tran,et al.  Improving Generalization and Stability of Generative Adversarial Networks , 2019, ICLR.

[12]  Volker Roth,et al.  Causal Deep Information Bottleneck , 2018, ArXiv.

[13]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[14]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[15]  Mihaela van der Schaar,et al.  Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes , 2017, NIPS.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  C. Blumberg Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[18]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[19]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[20]  Yun Fu,et al.  Matching on Balanced Nonlinear Representations for Treatment Effects Estimation , 2017, NIPS.

[21]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[22]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[23]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[24]  Jian Yang,et al.  Robust Tree-based Causal Inference for Complex Ad Effectiveness Analysis , 2015, WSDM.

[25]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[26]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[27]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[29]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[30]  Volker Roth,et al.  Cause-Effect Deep Information Bottleneck For Incomplete Covariates , 2018 .

[31]  David D. Cox,et al.  On the information bottleneck theory of deep learning , 2018, ICLR.

[32]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[33]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[34]  Lei Sun,et al.  Adversarial balancing-based representation learning for causal effect inference with observational data , 2019, Data Mining and Knowledge Discovery.

[35]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[36]  S. Goodman,et al.  Causal inference in public health. , 2013, Annual review of public health.