Validating Cross-Perspective Topic Modeling for Extracting Political Parties' Positions from Parliamentary Proceedings

In the literature, different topic models have been introduced that target the task of viewpoint extraction. Because, generally, these studies do not present thorough validations of the models they introduce, it is not clear in advance which topic modeling technique will work best for our use case of extracting viewpoints of political parties from parliamentary proceedings. We argue that the usefulness of methods like topic modeling depend on whether they yield valid and reliable results on real world data. This means that there is a need for validation studies. In this paper, we present such a study for an existing topic model for viewpoint extraction called cross-perspective topic modeling [11]. The model is applied to Dutch parliamentary proceedings, and the resulting topics and opinions are validated using external data. The results of our validation show that the model yields valid topics (content and criterion validity), and opinions with content validity. We conclude that cross-perspective topic modeling is a promising technique for extracting political parties' positions from parliamentary proceedings. Second, by exploring a number of validation methods, we demonstrate that validating topic models is feasible, even without extensive domain knowledge.

[1]  Jaap Kamps,et al.  Palmetto position storing Lucene index of Dutch Wikipedia , 2016 .

[2]  Edward G. Carmines,et al.  Reliability and Validity Assessment , 1979 .

[3]  Sandra L. Resodihardjo,et al.  Political Attention in a Coalition System: Analysing Queen's Speeches in the Netherlands 1945–2007 , 2009 .

[4]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[5]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[6]  Luo Si,et al.  Mining contrastive opinions on political texts using cross-perspective topic model , 2012, WSDM '12.

[7]  G. Breeman,et al.  Morality Issues in the Netherlands: Coalition Politics under Pressure , 2012 .

[8]  Noah A. Smith,et al.  Learning Topics and Positions from Debatepedia , 2013, EMNLP.

[9]  Wei-Hao Lin,et al.  A Joint Topic and Perspective Model for Ideological Discourse , 2008, ECML/PKDD.

[10]  Mohand Boughanem,et al.  VODUM: A Topic Model Unifying Viewpoint, Topic and Opinion Discovery , 2016, ECIR.

[11]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[13]  Maarten Marx,et al.  Are Topically Diverse Documents Also Interesting? , 2015, CLEF.

[14]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[15]  Michael J. Paul,et al.  A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics , 2010, AAAI.

[16]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[17]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[18]  W. Marsden I and J , 2012 .

[19]  Walter Daelemans,et al.  An efficient memory-based morphosyntactic tagger and parser for Dutch , 2007, CLIN 2007.

[20]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[21]  Osmar R. Zaïane,et al.  Mining Contentious Documents Using an Unsupervised Topic Model Based Approach , 2014, 2014 IEEE International Conference on Data Mining.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Dragomir R. Radev,et al.  How to Analyze Political Attention with Minimal Assumptions and Costs , 2010 .