Deep Counterfactual Networks with Propensity-Dropout

We propose a novel approach for inferring the individualized causal effects of a treatment (intervention) from observational data. Our approach conceptualizes causal inference as a multitask learning problem; we model a subject's potential outcomes using a deep multitask network with a set of shared layers among the factual and counterfactual outcomes, and a set of outcome-specific layers. The impact of selection bias in the observational data is alleviated via a propensity-dropout regularization scheme, in which the network is thinned for every training example via a dropout probability that depends on the associated propensity score. The network is trained in alternating phases, where in each phase we use the training examples of one of the two potential outcomes (treated and control populations) to update the weights of the shared layers and the respective outcome-specific layers. Experiments conducted on data based on a real-world observational study show that our algorithm outperforms the state-of-the-art.

[1]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[2]  Suchi Saria,et al.  A Bayesian Nonparametic Approach for Estimating Individualized Treatment-Response Curves , 2016, ArXiv.

[3]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[4]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[5]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[6]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[7]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[8]  John Langford,et al.  Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.

[9]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[10]  Hemant Ishwaran,et al.  Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods , 2017, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[11]  D. Rubin Matched Sampling for Causal Effects: Matching to Remove Bias in Observational Studies , 1973 .

[12]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[13]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[16]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[17]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[18]  G. Imbens,et al.  Matching on the Estimated Propensity Score , 2009 .

[19]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[20]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[21]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[22]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .