Theory-Driven Analysis of Large Corpora: Semisupervised Topic Classification of the UN Speeches

There is a growing interest in quantitative analysis of large corpora among the international relations (IR) scholars, but many of them find it difficult to perform analysis consistently with exist...

[1]  Anna Holzscheiter Between Communicative Interaction and Structures of Signification: Discourse Theory and Analysis in International Relations , 2014 .

[2]  Haym Hirsh,et al.  Improving Short-Text Classification using Unlabeled Data for Classification Problems , 2000, ICML.

[3]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[4]  Claire Cardie,et al.  Multi-aspect Sentiment Analysis with Topic Models , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[5]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[6]  Margaret E. Roberts,et al.  Computer‐Assisted Keyword and Document Set Discovery from Unstructured Text , 2017 .

[7]  Hal Daumé,et al.  Incorporating Lexical Priors into Topic Models , 2012, EACL.

[8]  Tomila V. Lankina,et al.  ‘Russian Spring’ or ‘Spring Betrayal’? The Media as a Mirror of Putin’s Evolving Strategy in Ukraine , 2017 .

[9]  Martijn Schoonvelde,et al.  No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications , 2018, Political Analysis.

[10]  Slava J. Mikhaylov,et al.  Understanding state preferences with text as data: Introducing the UN General Debate corpus , 2017, ArXiv.

[11]  Paul Heinbecker,et al.  Politics and Process at the United Nations: The Global Dance@@@Irrelevant or Indispensable: The United Nations in the 21st Century@@@Adapting the United Nations to a Postmodern Era , 2005 .

[12]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[13]  Richard Jackson,et al.  Writing the War on Terrorism: Language, Politics and Counter-Terrorism , 2005 .

[14]  Kurt Hornik,et al.  topicmodels : An R Package for Fitting Topic Models , 2016 .

[15]  Margaret E. Roberts,et al.  A Model of Text for Experimentation in the Social Sciences , 2016 .

[16]  David G. Rand,et al.  Structural Topic Models for Open‐Ended Survey Responses , 2014, American Journal of Political Science.

[17]  Kenneth Benoit,et al.  Natural Sentences as Valid Units for Coded Political Texts , 2012 .

[18]  Kohei Watanabe The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis , 2017 .

[19]  S. Brunn The worldviews of small states: A content analysis of 1995 UN speeches , 1999 .

[20]  Petter Nesser Chronology of Jihadism in Western Europe 1994–2007: Planned, Prepared, and Executed Terrorist Attacks , 2008 .

[21]  J. Milliken The Study of Discourse in International Relations: , 1999 .

[22]  Slava Mikhaylov,et al.  Topology Analysis of International Networks Based on Debates in the United Nations , 2017, ArXiv.

[23]  Nick Vaughan-Williams,et al.  New Materialisms, discourse analysis, and International Relations: a radical intertextual approach , 2014, Review of International Studies.

[24]  Kohei Watanabe Conspiracist propaganda: How Russia promotes anti-establishment sentiment online? , 2018 .

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  B. Banta Analysing discourse as a causal mechanism , 2013 .

[27]  Shogo Suzuki The rise of the Chinese ‘Other’ in Japan's construction of identity: Is China a focal point of Japanese nationalism? , 2015 .

[28]  Governmentalizing the Post–Cold War International Regime: The UN Debate on Democratization and Good Governance , 2005 .

[29]  D. Collier,et al.  Measurement Validity: A Shared Standard for Qualitative and Quantitative Research , 2001, American Political Science Review.

[30]  D. Rueschemeyer,et al.  The Impact of Economic Development on Democracy , 1993 .

[31]  L. Hansen Security as Practice: Discourse Analysis and the Bosnian War , 2006 .

[32]  Peter Sch Identifying document topics using the Wikipedia category network , 2006 .

[33]  Mirco Schönfeld,et al.  Discursive Landscapes and Unsupervised Topic Modeling in IR: A Validation of Text-As-Data Approaches through a New Corpus of UN Security Council Speeches on Afghanistan , 2018, ArXiv.

[34]  Patrice Bellot,et al.  Accurate and effective latent concept modeling for ad hoc information retrieval , 2014, Document Numérique.

[35]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.

[36]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[37]  Will Lowe,et al.  Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches , 2018, Legislative Studies Quarterly.

[38]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[39]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[40]  Margherita Pasini,et al.  Quality and Quantity in Test Validity: How can we be Sure that Psychological Tests Measure what they have to? , 2007 .

[41]  Haiyan Wang,et al.  quanteda: An R package for the quantitative analysis of textual data , 2018, J. Open Source Softw..

[42]  Péter Schönhofen,et al.  Identifying Document Topics Using the Wikipedia Category Network , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[43]  Sheng Tang,et al.  A density-based method for adaptive LDA model selection , 2009, Neurocomputing.

[44]  Kohei Watanabe,et al.  Newsmap: A semi-supervised approach to geographical news classification , 2018 .