Cross-Lingual Classification of Topics in Political Texts

In this paper, we propose an approach for cross-lingual topical coding of sentences from electoral manifestos of political parties in different languages. To this end, we exploit continuous semantic text representations and induce a joint multilingual semantic vector spaces to enable supervised learning using manually-coded sentences across different languages. Our experimental results show that classifiers trained on multilingual data yield performance boosts over monolingual topic classification.

[1]  Dustin Hillard,et al.  Automated classification of congressional legislation , 2006, DG.O.

[2]  Kenneth Benoit,et al.  Coder Reliability and Misclassification in the Human Coding of Party Manifestos , 2012, Political Analysis.

[3]  Goran Glavas,et al.  Unsupervised Cross-Lingual Scaling of Political Texts , 2017, EACL.

[4]  Simone Paolo Ponzetto,et al.  Entities as topic labels : combining entity linking and labeled LDA to improve topic interpretability and evaluability , 2016 .

[5]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[6]  Simone Paolo Ponzetto,et al.  TopFish: topic-based analysis of political position in US electoral campaigns , 2016 .

[7]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[8]  Antal van den Bosch,et al.  Automatic thematic classification of election manifestos , 2014, Inf. Process. Manag..

[9]  Sara Tonelli,et al.  Agreement and Disagreement: Comparison of Points of View in the Political Domain , 2016, COLING.

[10]  Simone Paolo Ponzetto,et al.  Building Entity-Centric Event Collections , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[11]  Margaret E. Roberts,et al.  Computer‐Assisted Keyword and Document Set Discovery from Unstructured Text , 2017 .

[12]  Jan Snajder,et al.  Analysis of Policy Agendas: Lessons Learned from Automatic Topic Classification of Croatian Political Texts , 2016, LaTeCH@ACL.

[13]  Goran Glavas,et al.  Unsupervised Text Segmentation Using Semantic Relatedness Graphs , 2016, *SEMEVAL.

[14]  Samy Bengio,et al.  The Handbook of Brain Theory and Neural Networks , 2002 .

[15]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Noah A. Smith,et al.  Measuring Ideological Proportions in Political Speeches , 2013, EMNLP.

[18]  Sven-Oliver Proksch,et al.  A Scaling Model for Estimating Time-Series Party Positions from Texts , 2007 .

[19]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[20]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[21]  Brandon M. Stewart,et al.  Use of force and civil–military relations in Russia: an automated content analysis , 2009 .

[22]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[23]  Heiner Stuckenschmidt,et al.  Classifying topics and detecting topic shifts in political manifestos , 2016 .

[24]  Slava J. Mikhaylov,et al.  Scaling policy preferences from coded political texts , 2011 .

[25]  Konstantinos Gemenis,et al.  What to Do (and Not to Do) with the Comparative Manifestos Project Data , 2013 .

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.