论文信息 - How to make causal inferences using texts

How to make causal inferences using texts

New text as data techniques offer a great promise: the ability to inductively discover measures that are useful for testing social science theories of interest from large collections of text. We introduce a conceptual framework for making causal inferences with discovered measures as a treatment or outcome. Our framework enables researchers to discover high-dimensional textual interventions and estimate the ways that observed treatments affect text-based outcomes. We argue that nearly all text-based causal inferences depend upon a latent representation of the text and we provide a framework to learn the latent representation. But estimating this latent representation, we show, creates new risks: we may introduce an identification problem or overfit. To address these risks we describe a split-sample framework and apply it to estimate causal effects from an experiment on immigration attitudes and a study on bureaucratic response. Our work provides a rigorous foundation for text-based causal inferences.

[1] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[2] J. Pennebaker,et al. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[3] Daniel J. Hopkins,et al. Causal Inference in Conjoint Analysis: Understanding Multidimensional Choices via Stated Preference Experiments , 2013 .

[4] Gary King,et al. The Changing Evidence Base of Social Science Research , 2009 .

[5] Judea Pearl,et al. Causal Inference , 2010 .

[6] Sandra González-Bailón,et al. Bit by bit: social research in the digital age , 2019, The Journal of Mathematical Sociology.

[7] Edoardo M. Airoldi,et al. Causal inference for ordinal outcomes , 2015 .

[8] Yee Whye Teh,et al. Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[9] Kimberly A. Neuendorf,et al. The Content Analysis Guidebook , 2001 .

[10] Cun-Hui Zhang,et al. Lasso adjustments of treatment effect estimates in randomized experiments , 2015, Proceedings of the National Academy of Sciences.

[11] Qiaozhu Mei,et al. Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis , 2014, ICML.

[12] Richard Biernacki,et al. Reinventing Evidence in Social Inquiry : Decoding Facts and Variables , 2013 .

[13] Margaret E. Roberts,et al. A Model of Text for Experimentation in the Social Sciences , 2016 .

[14] Alan E. Hubbard,et al. Statistical Inference for Data Adaptive Target Parameters , 2016, The international journal of biostatistics.

[15] Daniel M. Butler. Representing the Advantaged: How Politicians Reinforce Inequality , 2014 .

[16] Donald P. Green,et al. Field Experiments: Design, Analysis, and Interpretation , 2012 .

[17] Justin Grimmer,et al. Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods , 2017, Political Analysis.

[18] Thomas L. Griffiths,et al. The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[19] Margaret E. Roberts,et al. Navigating the Local Modes of Big Data: The Case of Topic Models , 2016, Computational Social Science.

[20] Mia Costa,et al. How Responsive are Political Elites? A Meta-Analysis of Experiments on Public Officials* , 2017, Journal of Experimental Political Science.

[21] Kevin Leyton-Brown,et al. Counterfactual Prediction with Deep Instrumental Variables Networks , 2016, ArXiv.

[22] Bruce A. Desmarais,et al. What Can We Learn from Predictive Modeling? , 2016, Political Analysis.

[23] Arthur Spirling,et al. Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It , 2017, Political Analysis.

[24] Tirthankar Dasgupta,et al. Treatment Effects on Ordinal Outcomes: Causal Estimands and Sharp Bounds , 2015, 1507.01542.

[25] E-Step. Structural Topic Models for Open Ended Survey Responses , 2022 .

[26] Justin Grimmer,et al. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[27] Tirthankar Dasgupta,et al. Sharp Bounds of Causal Effects on Ordinal Outcomes , 2015 .

[28] Justin Grimmer,et al. Discovery of Treatments from Text Corpora , 2016, ACL.

[29] Michael L. Anderson,et al. Split-Sample Strategies for Avoiding False Discoveries , 2017 .

[30] Leif D. Nelson,et al. False-Positive Psychology , 2011, Psychological science.

[31] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .

[32] Marcel Fafchamps,et al. Using Split Samples to Improve Inference on Causal Effects , 2016, Political Analysis.

[33] Michael Gill,et al. How Judicial Identity Changes the Text of Legal Rulings , 2015 .

[34] Stephanie T. Lanza,et al. Causal Inference in Latent Class Analysis , 2013, Structural equation modeling : a multidisciplinary journal.

[35] D. Rubin,et al. Causal Inference for Statistics, Social, and Biomedical Sciences: A General Method for Estimating Sampling Variances for Standard Estimators for Average Causal Effects , 2015 .

[36] Susan Athey,et al. Machine Learning and Causal Inference for Policy Evaluation , 2015, KDD.

[37] J. Pennebaker,et al. Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[38] Amy L. Catalinac,et al. From Pork to Policy: The Rise of Programmatic Campaigning in Japanese Elections , 2016, The Journal of Politics.

[39] Gary King,et al. A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[40] Macartan Humphreys,et al. Fishing, Commitment, and Communication: A Proposal for Comprehensive Nonbinding Research Registration , 2012, Political Analysis.

[41] Margaret E. Roberts,et al. Matching Methods for High-Dimensional Data with Applications to Text∗ , 2015 .

[42] Stefan Wager,et al. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[43] Adam Bonica,et al. The Political Ideologies of American Lawyers , 2015 .

[44] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[45] Dragomir R. Radev,et al. How to Analyze Political Attention with Minimal Assumptions and Costs , 2010 .

[46] Margaret E. Roberts,et al. stm: An R Package for Structural Topic Models , 2019, Journal of Statistical Software.

[47] Gary King,et al. General purpose computer-assisted clustering and conceptualization , 2011, Proceedings of the National Academy of Sciences.

[48] John W. Tukey,et al. We Need Both Exploratory and Confirmatory , 1980 .

[49] J. Robins,et al. Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[50] C. Bird,et al. Propaganda Technique in the World War. , 1928 .

[51] A. Spirling. U.S. Treaty Making with American Indians: Institutional Change and Relative Power, 1784–1911 , 2012 .

[52] Illtyd Trethowan. Causality , 1938 .

[53] Kristin M. Bakke,et al. The perils of policy by p-value: Predicting civil conflicts , 2010 .

[54] Marc Ratkovic,et al. Causal Inference through the Method of Direct Estimation , 2017, 1703.05849.

[55] David B. Dunson,et al. Probabilistic topic models , 2011, KDD '11 Tutorials.

[56] Amber E. Boydstun. Making the News: Politics, the Media, and Agenda Setting , 2013 .

[57] J. Carlin,et al. Beyond Power Calculations , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.

[58] Jens Hainmueller,et al. Public Attitudes toward Immigration , 2014 .

[59] XuanLong Nguyen,et al. Posterior contraction of the population polytope in finite admixture models , 2012, ArXiv.

[60] M. Laver,et al. Extracting Policy Positions from Political Texts Using Words as Data , 2003, American Political Science Review.

[61] Benjamin E. Lauderdale,et al. Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data , 2016, American Political Science Review.

[62] Justin Grimmer,et al. How Words and Money Cultivate a Personal Vote: The Effect of Legislator Credit Claiming on Constituent Credit Allocation , 2012, American Political Science Review.

[63] Susan Athey,et al. Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[64] Sven-Oliver Proksch,et al. A Scaling Model for Estimating Time-Series Party Positions from Texts , 2007 .

[65] M. J. Laan,et al. Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[66] Sanjeev Arora,et al. A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[67] Kevin M. Carlsmith,et al. Why do we punish? Deterrence and just deserts as motives for punishment. , 2002, Journal of personality and social psychology.