DESCN: Deep Entire Space Cross Networks for Individual Treatment Effect Estimation

Causal Inference has wide applications in various areas such as E-commerce and precision medicine, and its performance heavily relies on the accurate estimation of the Individual Treatment Effect (ITE). Conventionally, ITE is predicted by modeling the treated and control response functions separately in their individual sample spaces. However, such an approach usually encounters two issues in practice, i.e. divergent distribution between treated and control groups due to treatment bias, and significant sample imbalance of their population sizes. This paper proposes Deep Entire Space Cross Networks (DESCN) to model treatment effects from an end-to-end perspective. DESCN captures the integrated information of the treatment propensity, the response, and the hidden treatment effect through a cross network in a multi-task learning manner. Our method jointly learns the treatment and response functions in the entire sample space to avoid treatment bias and employs an intermediate pseudo treatment effect prediction network to relieve sample imbalance. Extensive experiments are conducted on a synthetic dataset and a large-scaled production dataset from the E-commerce voucher distribution business. The results indicate that DESCN can successfully enhance the accuracy of ITE estimation and improve the uplift ranking performance. A sample of the production dataset and the source code are released to facilitate future research in the community, which is, to the best of our knowledge, the first large-scale public biased treatment dataset for causal inference.

[1]  Yumin Zhang,et al.  Feature Selection Methods for Uplift Modeling , 2020, ArXiv.

[2]  Zhenyu Zhao,et al.  Uplift Modeling for Multiple Treatments with Cost Optimization , 2019, 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[3]  David M. Blei,et al.  Adapting Neural Networks for the Estimation of Treatment Effects , 2019, NeurIPS.

[4]  Stefan Bauer,et al.  Learning Counterfactual Representations for Estimating Individual Dose-Response Curves , 2019, AAAI.

[5]  Walter Karlen,et al.  Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks , 2018, ArXiv.

[6]  Xiao Ma,et al.  Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate , 2018, SIGIR.

[7]  Pierre Gutierrez,et al.  Causal Inference and Uplift Modelling: A Review of the Literature , 2017, PAPIs.

[8]  Sören R. Künzel,et al.  Metalearners for estimating heterogeneous treatment effects using machine learning , 2017, Proceedings of the National Academy of Sciences.

[9]  David Simchi-Levi,et al.  Uplift Modeling with Multiple Treatments and General Response Types , 2017, SDM.

[10]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[11]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[12]  P. Rzepakowski,et al.  Ensemble methods for uplift modeling , 2015, Data Mining and Knowledge Discovery.

[13]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[14]  Leo Guelman,et al.  A decision support framework to implement optimal personalized marketing interventions , 2015, Decis. Support Syst..

[15]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[16]  P. Rzepakowski,et al.  Decision trees for uplift modeling with single and multiple treatments , 2012, Knowledge and Information Systems.

[17]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[18]  Szymon Jaroszewicz,et al.  Decision trees for uplift modeling with single and multiple treatments , 2011, Knowledge and Information Systems.

[19]  C. Villani Optimal Transport: Old and New , 2008 .

[20]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[21]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[22]  K Lehnertz,et al.  Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Aidong Zhang,et al.  Representation Learning for Treatment Effect Estimation from Observational Data , 2018, NeurIPS.

[24]  J. Chemali,et al.  Summary and discussion of “ The central role of the propensity score in observational studies for causal effects , 2014 .

[25]  S. Jaroszewicz,et al.  Uplift modeling for clinical trial data , 2012 .

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .