A Joint Human/Machine Process for Coding Events and Conflict Drivers

Constructing datasets to analyse the progression of conflicts has been a longstanding objective of peace and conflict studies research. In essence, the problem is to reliably extract relevant text snippets and code (annotate) them using an ontology that is meaningful to social scientists. Such an ontology usually characterizes either types of violent events (killing, bombing, etc.), and/or the underlying drivers of conflict, themselves hierarchically structured, for example security, governance and economics, subdivided into conflict-specific indicators. Numerous coding approaches have been proposed in the social science literature, ranging from fully automated “machine” coding to human coding. Machine coding is highly error prone, especially for labelling complex drivers, and suffers from extraction of duplicated events, but human coding is expensive, and suffers from inconsistency between annotators; thus hybrid approaches are required. In this paper, we analyse experimentally how human input can most effectively be used in a hybrid system to complement machine coding. Using two newly created real-world datasets, we show that machine learning methods improve on rule-based automated coding for filtering large volumes of input, while human verification of relevant/irrelevant text leads to improved performance of machine learning for predicting multiple labels in the ontology.

[1]  Gary LaFree,et al.  Introducing the Global Terrorism Database , 2007 .

[2]  J. Jenkins,et al.  Mapping Mass Political Conflict and Civil Society , 1997 .

[3]  Clionadh Raleigh,et al.  Introducing ACLED: An Armed Conflict Location and Event Dataset , 2010 .

[4]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[5]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[6]  E. Azar The Conflict and Peace Data Bank (COPDAB) Project , 1980 .

[7]  Philip A. Schrodt,et al.  The Dimensionality of Political News Reports , 2012 .

[8]  Philip A. Schrodt,et al.  A Guide to Event Data: Past, Present, and Future , 2016 .

[9]  C. McClelland World event/interaction survey , 1978 .

[10]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[11]  Matthew Hayes,et al.  A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data , 2015 .

[12]  Joe Bond,et al.  Integrated Data for Events Analysis (IDEA): An Event Typology for Automated Events Data Development , 2003 .

[13]  Philip A. Schrodt,et al.  Political Science: KEDS—A Program for the Machine Coding of Event Data , 1994 .

[14]  Karl-Michael Schneider,et al.  Techniques for Improving the Performance of Naive Bayes for Text Classification , 2005, CICLing.

[15]  Philip A. Schrodt,et al.  Conflict and Mediation Event Observations (CAMEO): A New Event Data Framework for the Analysis of Foreign Policy Interactions , 2002 .