Multidimensional topic analysis in political texts

Abstract Automatic content analysis is more and more becoming an accepted research method in social science. In political science researchers are using party manifestos and transcripts of political speeches to analyze the positions of different actors. Existing approaches are limited to a single dimension, in particular, they cannot distinguish between the positions with respect to a specific topic. In this paper, we propose a method for analyzing and comparing documents according to a set of predefined topics that is based on an extension of Latent Dirichlet Allocation (LDA) for inducing knowledge about relevant topics. We validate the method by showing that it can guess which member of a coalition was assigned a certain ministry based on a comparison of the parties' election manifestos with the coalition contract. We apply the method to German National Elections since 1990 and show that the use of our method consistently outperforms a baseline method that simulates manual annotation of individual sentences based on keywords and standard text comparison. In our experiments, we compare two different extensions of LDA and investigate the influence of the used seed set. Finally, we give a brief illustration of how the output of our method can be interpreted to compare positions towards specific topics across several parties.

[1]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[2]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[3]  Xiaojin Zhu,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic , 2022 .

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Heiner Stuckenschmidt,et al.  Multi-dimensional Analysis of Political Documents , 2012, NLDB.

[6]  Kenneth Benoit,et al.  Treating Words as Data with Error: Uncertainty in Text Statements of Policy Positions , 2009 .

[7]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[8]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[9]  M. Laver,et al.  Estimating policy positions from political texts , 2000 .

[10]  Dennis Eichmann Party Competition An Agent Based Model , 2016 .

[11]  Nicole Michaela Seher Politikfeldspezifische Positionen der Landesverbände der deutschen Parteien , 2011 .

[12]  Gosse Bouma,et al.  Natural language processing and information systems, 17th international conference on applications of natural language to information systems, NLDB, Groningen , 2012 .

[13]  Franz Urban Pappi,et al.  Das Politikangebot deutscher Parteien bei den Bundestagswahlen seit 1976 im dimensionsweisen Vergleich : Gesamtskala und politikfeldspezifische Skalen , 2011 .

[14]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[15]  Sven-Oliver Proksch,et al.  A Scaling Model for Estimating Time-Series Party Positions from Texts , 2007 .

[16]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.