Causal Deep Information Bottleneck

Estimating causal effects in the presence of latent confounding is a frequently occurring problem in several tasks. In real world applications such as medicine, accounting for the effects of latent confounding is even more challenging as a result of high-dimensional and noisy data. In this work, we propose estimating the causal effect from the perspective of the information bottleneck principle by explicitly identifying a low-dimensional representation of latent confounding. In doing so, we prove theoretically that the proposed model can be used to recover the average causal effect. Experiments on both synthetic data and existing causal benchmarks illustrate that our method achieves state-of-the-art performance in terms of prediction accuracy and sample efficiency, without sacrificing interpretability.

[1]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[2]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[3]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[4]  P. Frost Proxy Variables and Specification Bias , 1979 .

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[7]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[8]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[9]  J. Robins,et al.  Estimating causal effects from epidemiological data , 2006, Journal of Epidemiology and Community Health.

[10]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[11]  T. Speed,et al.  On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9 , 1990 .

[12]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13]  Suchi Saria,et al.  What-If Reasoning using Counterfactual Gaussian Processes , 2017, NIPS 2017.

[14]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[15]  Nan Jiang,et al.  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[16]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[17]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[18]  M. Wickens A Note on the Use of Proxy Variables , 1972 .

[19]  Jian Yang,et al.  Causal Inference via Sparse Additive Models with Application to Online Advertising , 2015, AAAI.

[20]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[21]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[22]  Volker Roth,et al.  Meta-Gaussian Information Bottleneck , 2012, NIPS.

[23]  Z. Geng,et al.  Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder. , 2016, Biometrika.

[24]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[25]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Suchi Saria,et al.  Reliable Decision Support using Counterfactual Models , 2017, NIPS.

[27]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[28]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[29]  Jerry A. Hausman,et al.  Errors in Variables in Panel Data , 1984 .

[30]  Mihaela van der Schaar,et al.  Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes , 2017, NIPS.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Volker Roth,et al.  Learning Sparse Latent Representations with the Deep Copula Information Bottleneck , 2018, ICLR.

[33]  Judea Pearl,et al.  On Measurement Bias in Causal Inference , 2010, UAI.

[34]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[35]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[36]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[37]  Sander Greenland,et al.  Bias Analysis , 2011, International Encyclopedia of Statistical Science.

[38]  Fredrik D. Johansson,et al.  Learning Weighted Representations for Generalization Across Designs , 2018, 1802.08598.