On the Estimation of Treatment Effect with Text Covariates

Estimating the treatment effect benefits decision making in various domains as it can provide the potential outcomes of different choices. Existing work mainly focuses on covariates with numerical values, while how to handle covariates with textual information for treatment effect estimation is still an open question. One major challenge is how to filter out the nearly instrumental variables which are the variables more predictive to the treatment than the outcome. Conditioning on those variables to estimate the treatment effect would amplify the estimation bias. To address this challenge, we propose a conditional treatment-adversarial learning based matching method (CTAM). CTAM incorporates the treatment-adversarial learning to filter out the information related to nearly instrumental variables when learning the representations, and then it performs matching among the learned representations to estimate the treatment effects. The conditional treatment-adversarial learning helps reduce the bias of treatment effect estimation, which is demonstrated by our experimental results on both semi-synthetic and real-world datasets.

[1]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[2]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[3]  Aslib,et al.  The journal of documentation , 1945 .

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Michael J. Lopez,et al.  Estimation of causal effects with multiple treatments: a review and new ideas , 2017, 1701.05132.

[8]  Yun Fu,et al.  Matching via Dimensionality Reduction for Estimation of Treatment Effects in Digital Marketing Campaigns , 2016, IJCAI.

[9]  Bo Li,et al.  Treatment Effect Estimation with Data-Driven Variable Decomposition , 2017, AAAI.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  J. Wooldridge Should Instrumental Variables be Used as Matching Variables , 2016 .

[12]  Tommi S. Jaakkola,et al.  Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture , 2017, ICML.

[13]  R. Maitra,et al.  Supplement to “ A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere ” published in the Journal of Computational and Graphical Statistics , 2009 .

[14]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[15]  J. Brooks-Gunn,et al.  Effects of Early Intervention on Cognitive Function of Low Birth Weight Preterm Infants, , 1992, The Journal of pediatrics.

[16]  Margaret E. Roberts,et al.  How to make causal inferences using texts , 2018, Science advances.

[17]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[18]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19]  J. Sekhon,et al.  Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies , 2006, Review of Economics and Statistics.

[20]  Bo Li,et al.  Estimating Treatment Effect in the Wild via Differentiated Confounder Balancing , 2017, KDD.

[21]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[22]  D. Rubin,et al.  Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in Observational Studies , 1978 .

[23]  Mark Dredze,et al.  Challenges of Using Text Classifiers for Causal Inference , 2018, EMNLP.

[24]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[25]  Margaret E. Roberts,et al.  Adjusting for Confounding with Text Matching , 2020 .

[26]  K. Roeder,et al.  Journal of the American Statistical Association: Comment , 2006 .

[27]  Alan Macfarlane,et al.  Social , 1994, Schizophrenia Research.

[28]  Ruocheng Guo,et al.  A Survey of Learning Causality with Data , 2018, ACM Comput. Surv..

[29]  M Alan Brookhart,et al.  Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. , 2011, American journal of epidemiology.

[30]  Judea Pearl,et al.  On a Class of Bias-Amplifying Variables that Endanger Effect Estimates , 2010, UAI.

[31]  Alexander D'Amour,et al.  Overlap in observational studies with high-dimensional covariates , 2017, Journal of Econometrics.

[32]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[33]  Luke Miratrix,et al.  Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality , 2018, Political Analysis.

[34]  B. Jean Mandernach,et al.  Journal on Educational Psychology , 2014 .

[35]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.