Adjusting for Confounding with Text Matching

We identify situations in which conditioning on text can address confounding in observational studies. We argue that a matching approach is particularly well-suited to this task, but existing matching methods are ill-equipped to handle high-dimensional text data. Our proposed solution is to estimate a low-dimensional summary of the text and condition on this summary via matching. We propose a method of text matching, topical inverse regression matching, that allows the analyst to match both on the topical content of confounding documents and the probability that each of these documents is treated. We validate our approach and illustrate the importance of conditioning on text to address confounding with two applications: the effect of perceptions of author gender on citation counts in the international relations literature and the effects of censorship on Chinese social media users. Verification Materials: The materials required to verify the computational reproducibility of the results, procedures, and analyses in this article are available on the American Journal of Political Science Dataverse within the Harvard Dataverse Network, at: https://doi.org/10.7910/DVN/HTMX3K. Social media users in China are censored every day, but it is largely unknown how the experience of being censored affects their future online experience. Are social media users who are censored for the first time flagged by censors for increased scrutiny in the future? Is censorship “targeted” and “customized” toward specific users? Do social media users avoid writing after being censored? Do they continue to write on sensitive topics or do they avoid them? Experimentally manipulating censorship would allow us to make credible causal inferences about the effects of experiencing censorship, but this is impractical Margaret E. Roberts is Associate Professor, Department of Political Science, University of California, San Diego, Social Sciences Building 301, 9500 Gilman Drive, #0521, La Jolla, CA 92093-0521 (meroberts@ucsd.edu). Brandon M. Stewart is Assistant Professor and Arthur H. Scribner Bicentennial Preceptor, Department of Sociology, Princeton University, 149 Wallace Hall, Princeton, NJ 08544 (bms4@princeton.edu). Richard A. Nielsen is Associate Professor, Department of Political Science, Massachusetts Institute for Technology, 77 Massachusetts Avenue, E53 Room 455, Cambridge, MA 02139 (rnielsen@mit.edu). We thank the following for helpful comments and suggestions on this work: David Blei, Naoki Egami, Chris Felton, James Fowler, Justin Grimmer, Erin Hartman, Chad Hazlett, Seth Hill, Kosuke Imai, Rebecca Johnson, Gary King, Adeline Lo, Will Lowe, Chris Lucas, Walter Mebane, David Mimno, Jennifer Pan, Marc Ratkovic, Matt Salganik, Caroline Tolbert, and Simone Zhang; audiences at the Princeton Text Analysis Workshop, Princeton Politics Methods Workshop, the University of Rochester, Microsoft Research, the Text as Data Conference, and the Political Methodology Society and the Visions in Methodology conference; and some tremendously helpful anonymous reviewers. We especially thank Dustin Tingley for numerous insightful conversations on the connections between STM and causal inference and Ian Lundberg for extended discussions on some technical details. Dan Maliniak, Ryan Powers, and Barbara Walter graciously supplied data and replication code for the gender and citations study. The JSTOR Data for Research program provided academic journal data for the international relations application. This research was supported, in part, by the Eunice Kennedy Shriver National Institute of Child Health and Human Development under grant P2-CHD047879 to the Office of Population Research at Princeton University. The research was also supported by grants from the National Science Foundation RIDIR program, award numbers 1738411 and 1738288. This publication was made possible, in part, by a grant from the Carnegie Corporation of New York, supporting Richard Nielsen as an Andrew Carnegie Fellow. The statements made and views expressed are solely the responsibility of the authors. and unethical outside of a lab setting. Inferring causal effects in observational settings is challenging due to confounding. The types of users who are censored might have different opinions that drive them to write differently than the types of users who are not censored. This in turn might affect both the users’ rate of censorship as well as future behavior and outcomes. We argue that conditioning on the text of censored social media posts and other user-level characteristics can substantially decrease or eliminate confounding and allow credible causal inferences with observational data. Intuitively, if we can find nearly identical posts—one of which is censored while the American Journal of Political Science, Vol. 64, No. 4, October 2020, Pp. 887–903 C ©2020, Midwest Political Science Association DOI: 10.1111/ajps.12526

[1]  Christopher Weiss,et al.  Challenges With Propensity Score Strategies in a High-Dimensional Setting and a Potential Alternative , 2011, Multivariate behavioral research.

[2]  G. King,et al.  Multivariate Matching Methods That Are Monotonic Imbalance Bounding , 2011 .

[3]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[4]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[5]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[6]  B. Hansen The prognostic analogue of the propensity score , 2008 .

[7]  Peter Marolt Grassroots agency in a civil sphere? Rethinking Internet control in China , 2011 .

[8]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.

[9]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[10]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[11]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[12]  C. Hazlett,et al.  Kernel Balancing: A Flexible Non-Parametric Weighting Procedure for Estimating Causal Effects , 2016, 1605.00155.

[13]  Alexander D'Amour,et al.  Overlap in observational studies with high-dimensional covariates , 2017, Journal of Econometrics.

[14]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[15]  Zeynep Tufekci,et al.  Engineering the public: Big data, surveillance and computational politics , 2014, First Monday.

[16]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.

[17]  Luke Keele,et al.  An overview of rbounds: An R package for Rosenbaum bounds sensitivity analysis with matched data. , 2010 .

[18]  Erin Hartman,et al.  An Equivalence Approach to Balance and Placebo Tests , 2018, American Journal of Political Science.

[19]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[20]  Margaret E. Roberts Censored: Distraction and Diversion Inside China's Great Firewall , 2018 .

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Stefano M. Iacus,et al.  A Theory of Statistical Inference for Matching Methods in Causal Research , 2018, Political Analysis.

[23]  Margaret E. Roberts,et al.  stm: An R Package for Structural Topic Models , 2019, Journal of Statistical Software.

[24]  J. Pearl,et al.  Measurement bias and effect restoration in causal inference , 2014 .

[25]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[26]  Michael Chau,et al.  Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and the Real-Name Registration Policy , 2013, IEEE Internet Computing.

[27]  A. Spirling U.S. Treaty Making with American Indians: Institutional Change and Relative Power, 1784–1911 , 2012 .

[28]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[29]  Kevin Leyton-Brown,et al.  Counterfactual Prediction with Deep Instrumental Variables Networks , 2016, ArXiv.

[30]  David M. Blei,et al.  The Blessings of Multiple Causes , 2018, Journal of the American Statistical Association.

[31]  L. J. Zigerell,et al.  Reducing Political Bias in Political Science Estimates , 2017, PS: Political Science & Politics.

[32]  Richard A. Nielsen,et al.  Why Propensity Scores Should Not Be Used for Matching , 2019, Political Analysis.

[33]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[34]  Dustin Tran,et al.  Implicit Causal Models for Genome-wide Association Studies , 2017, ICLR.

[35]  Marc Ratkovic,et al.  Causal Inference through the Method of Direct Estimation , 2017, 1703.05849.

[36]  J. Sekhon,et al.  Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies , 2006, Review of Economics and Statistics.

[37]  Cun-Hui Zhang,et al.  Lasso adjustments of treatment effect estimates in randomized experiments , 2015, Proceedings of the National Academy of Sciences.

[38]  Dhanya Sridhar,et al.  Using Text Embeddings for Causal Inference , 2019, ArXiv.

[39]  Christian Hansen,et al.  Double/Debiased/Neyman Machine Learning of Treatment Effects , 2017, 1701.08687.

[40]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[41]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[42]  Nathan Kallus,et al.  DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training , 2018, ICML.

[43]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[44]  Angie Wade Matched Sampling for Causal Effects , 2008 .

[45]  Grant Potter,et al.  China's "Networked Authoritarianism" , 2018 .

[46]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[47]  Dennis M. Murphy The Net Delusion: The Dark Side of Internet Freedom , 2012 .

[48]  Barbara F. Walter,et al.  The Gender Citation Gap in International Relations , 2013, International Organization.

[49]  David M. Blei,et al.  Using Embeddings to Correct for Unobserved Confounding , 2019, ArXiv.

[50]  Margaret E. Roberts,et al.  How to make causal inferences using texts , 2018, Science advances.

[51]  Mark Dredze,et al.  Challenges of Using Text Classifiers for Causal Inference , 2018, EMNLP.

[52]  Jing Gao,et al.  On the Estimation of Treatment Effect with Text Covariates , 2019, IJCAI.

[53]  J. Pearl Invited commentary: understanding bias amplification. , 2011, American journal of epidemiology.

[54]  Jasjeet S. Sekhon,et al.  Opiates for the Matches: Matching Methods for Causal Inference , 2009 .

[55]  Adam C Sales,et al.  Rebar: Reinforcing a Matching Estimator With Predictions From High-Dimensional Covariates , 2015, 1505.04697.

[56]  Xiao Qiang,et al.  Political Expression in the Chinese Blogosphere: Below the Radar , 2008 .

[57]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[58]  Margaret E. Roberts,et al.  How Censorship in China Allows Government Criticism but Silences Collective Expression , 2013, American Political Science Review.

[59]  Thomas Hegghammer Should I Stay or Should I Go? Explaining Variation in Western Jihadists' Choice between Domestic and Foreign Fighting , 2013, American Political Science Review.

[60]  Barbara F. Walter,et al.  A Reply to “Reducing Political Bias in Political Science Estimates” , 2017, PS: Political Science & Politics.

[61]  Margaret E. Roberts,et al.  Reverse-engineering censorship in China: Randomized experimentation and participant observation , 2014, Science.

[62]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[63]  Luke Miratrix,et al.  Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality , 2018, Political Analysis.

[64]  Matt Taddy,et al.  Multinomial Inverse Regression for Text Analysis , 2010, 1012.2098.

[65]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[66]  Christopher Lucas,et al.  The Balance-Sample Size Frontier in Matching Methods for Causal Inference , 2016 .

[67]  D B Rubin,et al.  Matching using estimated propensity scores: relating theory to practice. , 1996, Biometrics.

[68]  D. Rubin,et al.  Combining Propensity Score Matching with Additional Adjustments for Prognostic Covariates , 2000 .

[69]  Matt Taddy Rejoinder: Efficiency and Structure in MNIR , 2013 .

[70]  Matt Taddy,et al.  Measuring Political Sentiment on Twitter: Factor Optimal Design for Multinomial Inverse Regression , 2012, Technometrics.

[71]  T. Shakespeare,et al.  Observational Studies , 2003 .

[72]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[73]  Lise Getoor,et al.  Estimating Causal Effects of Tone in Online Debates , 2019, IJCAI.

[74]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[75]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[76]  David M. Blei,et al.  The Inverse Regression Topic Model , 2014, ICML.

[77]  G. Imbens,et al.  On the Failure of the Bootstrap for Matching Estimators , 2006 .

[78]  Katherine A. Keith,et al.  Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates , 2020, ACL.

[79]  Brendan T. O'Connor,et al.  Censorship and deletion practices in Chinese social media , 2012, First Monday.

[80]  Margaret E. Roberts,et al.  A Model of Text for Experimentation in the Social Sciences , 2016 .

[81]  Daniel Westreich,et al.  Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. , 2010, Journal of clinical epidemiology.

[82]  Barnabás Póczos,et al.  Two-stage sampled learning theory on distributions , 2015, AISTATS.