Learning Representations for Counterfactual Inference

Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, "Would this patient have lower blood sugar had she received a different medication?". We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art.

[1]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[2]  R. Prentice Use of the logistic model in retrospective studies. , 1976, Biometrics.

[3]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  T. Shakespeare,et al.  Observational Studies , 2003 .

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[8]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[9]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[10]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[11]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[12]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[13]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[14]  Marie Davidian,et al.  Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. , 2008, Statistical science : a review journal of the Institute of Mathematical Statistics.

[15]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[16]  M. J. van der Laan,et al.  Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules , 2007, The international journal of biostatistics.

[17]  D. Hubin,et al.  THE JOURNAL OF PHILOSOPHY , 2004 .

[18]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[19]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[20]  V. Chernozhukov,et al.  Inference on Counterfactual Distributions , 2009, 0904.0951.

[21]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[22]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[23]  John Langford,et al.  Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.

[24]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[25]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[26]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[27]  J. Pearl Invited commentary: understanding bias amplification. , 2011, American journal of epidemiology.

[28]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[29]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[30]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[32]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[33]  Mehryar Mohri,et al.  Domain adaptation and sample bias correction theory and algorithm for regression , 2014, Theor. Comput. Sci..

[34]  Lu Tian,et al.  A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates , 2012, 1212.2995.

[35]  John Langford,et al.  Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.

[36]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[37]  R. Zemel,et al.  THE VARIATIONAL FAIR AUTO ENCODER , 2015 .

[38]  David Page,et al.  Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution , 2015, AMIA.

[39]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[40]  S. Kruger Design Of Observational Studies , 2016 .

[41]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[42]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[43]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.