Distributed Framework for Political Event Coding in Real-Time

Studying political activities and interaction between different entities gradually becoming a prominent field of research for both the social science and computer science researchers. Research is being carried out at local (limited to a particular region) and global scale, often divided in temporal manner. It is also useful to have a most recent dataset to have an up-to-date analysis. For these purposes, we need timestamped, structured information about political interactions. Keeping this in mind, we develop a distributed framework that works in real-time with Apache Kafka and SPARK for processing a global collection of news data in different languages (i.e., Spanish, Arabic) and generate those structured data. We also provide a API for easy access to the data. In this paper, we will describe how the system works, how to access the data and possible analytical problems that can be addressed by building a model on the dataset.

[1]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[2]  Gilbert Ritschard,et al.  What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures , 2016 .

[3]  John Beieler Generating Politically-Relevant Event Data , 2016, NLP+CSS@EMNLP.

[4]  Yan Liang,et al.  Adaptive scalable pipelines for political event data generation , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[5]  Kristine Eck,et al.  In data we trust? A comparison of UCDP GED and ACLED conflict events datasets , 2012 .

[6]  Naeemul Hassan,et al.  The Quest to Automate Fact-Checking , 2015 .

[7]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[8]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[9]  Reynold Xin,et al.  Apache Spark , 2016 .

[10]  Dennis McLeod,et al.  Effective Retrieval of Audio Information from Annotated Text Using Ontologies , 2000, MDM/KDD.

[11]  Latifur Khan,et al.  RePAIR: Recommend political actors in real-time from news websites , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[12]  John Beieler,et al.  PETRARCH2: Another Event Coding Program , 2017, J. Open Source Softw..

[13]  Philip A. Schrodt Precedents, Progress, and Prospects in Political Event Data , 2012 .

[14]  Ian T. Foster,et al.  Jetstream: a self-provisioned, scalable science and engineering cloud environment , 2015, XSEDE.

[15]  Bhavani M. Thuraisingham,et al.  Spark-Based Political Event Coding , 2016, 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService).

[16]  Philip A. Schrodt,et al.  Conflict and Mediation Event Observations (CAMEO): A New Event Data Framework for the Analysis of Foreign Policy Interactions , 2002 .