A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries

We propose a coherent set of tasks for protest information collection in the context of generalizable natural language processing. The tasks are news article classification, event sentence detection, and event extraction. Having tools for collecting event information from data produced in multiple countries enables comparative sociology and politics studies. We have annotated news articles in English from a source and a target country in order to be able to measure the performance of the tools developed using data from one country on data from a different country. Our preliminary experiments have shown that the performance of the tools developed using English texts from India drops to a level that are not usable when they are applied on English texts from China. We think our setting addresses the challenge of building generalizable NLP tools that perform well independent of the source of the text and will accelerate progress in line of developing generalizable NLP systems.

[1]  M. Giugni,et al.  Was it Worth the Effort? The Outcomes and Consequences of Social Movements , 1998 .

[2]  Erica Chenoweth,et al.  Unpacking nonviolent campaigns , 2013 .

[3]  Erdem Yörük,et al.  Towards Building a Political Protest Database to Explain Changes in the Welfare State , 2016, LaTeCH@ACL.

[4]  Ralph Weischedel,et al.  Automatic Extraction of Events from Open Source Text for Predictive Forecasting , 2013 .

[5]  Wei Wang Event Detection and Extraction from News Articles , 2018 .

[6]  Nils B. Weidmann,et al.  Using machine-coded event data for the micro-level study of political violence , 2014 .

[7]  Erdem Yörük,et al.  Towards Generalizable Place Name Recognition Systems: Analysis and Enhancement of NER Systems on English News from India , 2018, GIR@SIGSPATIAL.

[8]  D. Lazer,et al.  Growing pains for global monitoring of societal events , 2016, Science.

[9]  Matthew Hayes,et al.  A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data , 2015 .

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Philip A. Schrodt,et al.  Three's a Charm?: Open Event Data Coding with EL:DIABLO, PETRARCH, and the Open Event Data Alliance. , 2014 .

[12]  Nicola Ferro,et al.  Report on GLARE 2018: 1st Workshop on Generalization in Information Retrieval , 2019, SIGF.

[13]  Emily M. Bender,et al.  Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task , 2017, Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems.

[14]  S. Tarrow,et al.  Power in Movement: Social Movements, Collective Action and Politics , 1994 .

[15]  Erdem Yörük,et al.  The politics of the Turkish welfare system transformation in the neoliberal era: Welfare as mobilization and containment , 2012 .