Computational Analysis of Political Texts: Bridging Research Efforts Across Communities

In the last twenty years, political scientists started adopting and developing natural language processing (NLP) methods more actively in order to exploit text as an additional source of data in their analyses. Over the last decade the usage of computational methods for analysis of political texts has drastically expanded in scope, allowing for a sustained growth of the text-as-data community in political science. In political science, NLP methods have been extensively used for a number of analyses types and tasks, including inferring policy position of actors from textual evidence, detecting topics in political texts, and analyzing stylistic aspects of political texts (e.g., assessing the role of language ambiguity in framing the political agenda). Just like in numerous other domains, much of the work on computational analysis of political texts has been enabled and facilitated by the development of resources such as, the topically coded electoral programmes (e.g., the Manifesto Corpus) or topically coded legislative texts (e.g., the Comparative Agenda Project). Political scientists created resources and used available NLP methods to process textual data largely in isolation from the NLP community. At the same time, NLP researchers addressed closely related tasks such as election prediction, ideology classification, and stance detection. In other words, these two communities have been largely agnostic of one another, with NLP researchers mostly unaware of interesting applications in political science and political scientists not applying cutting-edge NLP methodology to their problems. The main goal of this tutorial is to systematize and analyze the body of research work on political texts from both communities. We aim to provide a gentle, all-round introduction to methods and tasks related to computational analysis of political texts. Our vision is to bring the two research communities closer to each other and contribute to faster and more significant developments in this interdisciplinary research area.

[1]  Goran Glavas,et al.  Cross-Lingual Classification of Topics in Political Texts , 2017, NLP+CSS@ACL.

[2]  Gijs Schumacher,et al.  EUSpeech: a New Dataset of EU Elite Speeches , 2016 .

[3]  Simone Paolo Ponzetto,et al.  Topic-Based Agreement and Disagreement in US Electoral Manifestos , 2017, EMNLP.

[4]  James E. Campbell Ambiguity in the Issue Positions of Presidential Candidates: A Causal Analysis , 1983 .

[5]  Sven Regel,et al.  Party Facts: A database of political parties worldwide , 2019, Party Politics.

[6]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[7]  E-Step Structural Topic Models for Open Ended Survey Responses , 2022 .

[8]  Dragomir R. Radev,et al.  An Automated Method of Topic-Coding Legislative Speech Over Time with Application to the 105th-108th U.S. Senate , 2006 .

[9]  Heiner Stuckenschmidt,et al.  Political Text Scaling Meets Computational Semantics , 2019, Trans. Data Sci..

[10]  J. Lewandowski,et al.  The Manifesto Corpus: A new resource for research on political parties and quantitative text analysis , 2016 .

[11]  Jan Snajder,et al.  Analysis of Policy Agendas: Lessons Learned from Automatic Topic Classification of Croatian Political Texts , 2016, LaTeCH@ACL.

[12]  Loren Collingwood,et al.  Tradeoffs in Accuracy and Efficiency in Supervised Learning Methods , 2012 .

[13]  J. Eichorst,et al.  Resist to Commit: Concrete Campaign Statements and the Need to Clarify a Partisan Reputation , 2019, The Journal of Politics.

[14]  L. Hooghe,et al.  Measuring party positions in Europe , 2015 .

[15]  Dragomir R. Radev,et al.  How to Analyze Political Attention with Minimal Assumptions and Costs , 2010 .

[16]  Noah A. Smith,et al.  Learning Topics and Positions from Debatepedia , 2013, EMNLP.

[17]  Heiner Stuckenschmidt,et al.  Classifying topics and detecting topic shifts in political manifestos , 2016 .

[18]  Arnim Bleier,et al.  Findings from the hackathon on understanding euroscepticism through the lens of textual data , 2018 .

[19]  Slava J. Mikhaylov,et al.  Scaling policy preferences from coded political texts , 2011 .

[20]  Sven-Oliver Proksch,et al.  A Scaling Model for Estimating Time-Series Party Positions from Texts , 2007 .

[21]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[22]  Goran Glavas,et al.  Unsupervised Text Segmentation Using Semantic Relatedness Graphs , 2016, *SEMEVAL.

[23]  P. Valkenburg,et al.  Framing European politics: a content analysis of press and television news , 2000 .

[24]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[25]  Timothy Baldwin,et al.  Hierarchical Structured Model for Fine-to-Coarse Manifesto Text Analysis , 2018, NAACL.

[26]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[27]  Bryan D. Jones,et al.  Comparative studies of policy agendas , 2006 .

[28]  Shaun Bevan Gone Fishing , 2019, Comparative Policy Agendas.

[29]  Dustin Hillard,et al.  Computer-Assisted Topic Classification for Mixed-Methods Social Science Research , 2008 .

[30]  Kenneth Benoit,et al.  Coder Reliability and Misclassification in the Human Coding of Party Manifestos , 2012, Political Analysis.

[31]  M. Laver,et al.  Extracting Policy Positions from Political Texts Using Words as Data , 2003, American Political Science Review.

[32]  Benjamin E. Lauderdale,et al.  Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data , 2016, American Political Science Review.

[33]  Martin Harrop,et al.  How Voters Change: The 1987 British Election Campaign in Perspective , 1990 .

[34]  Goran Glavas,et al.  Unsupervised Cross-Lingual Scaling of Political Texts , 2017, EACL.

[35]  John R. Petrocik Issue Ownership in Presidential Elections, with a 1980 Case Study , 1996 .

[36]  Andrea Volkens,et al.  Manifesto Coding Instructions , 2002 .

[37]  Laura Hollink,et al.  The debates of the European Parliament as Linked Open Data , 2017, Semantic Web.

[38]  Gary King,et al.  An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design , 2003, International Organization.

[39]  Will Lowe,et al.  Central Bank Communication as Public Opinion : Experimental Evidence ∗ , 2018 .

[40]  automatic classification of , 2009 .

[41]  Timothy Baldwin,et al.  Joint Sentence-Document Model for Manifesto Text Analysis , 2017, ALTA.

[42]  Simone Paolo Ponzetto,et al.  TopFish: topic-based analysis of political position in US electoral campaigns , 2016 .

[43]  Benjamin I. Page The Theory of Political Ambiguity , 1976, American Political Science Review.

[44]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.