Developing Annotated Resources for Internal Displacement Monitoring

This paper describes in details the design and development of a novel annotation framework and of annotated resources for Internal Displacement, as the outcome of a collaboration with the Internal Displacement Monitoring Centre, aimed at improving the accuracy of their monitoring platform IDETECT. The schema includes multi-faceted description of the events, including cause, quantity of people displaced, location and date. Higher-order facets aimed at improving the information extraction, such as document relevance and type, are proposed. We also report a case study of machine learning application to the document classification tasks. Finally, we discuss the importance of standardized schema in dataset benchmark development and its impact on the development of reliable disaster monitoring infrastructure.

[1]  Lora Aroyo,et al.  Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study , 2019, LDK.

[2]  Ali Jabbari,et al.  A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News , 2020, LREC.

[3]  Tommaso Caselli,et al.  FacTA: Evaluation of Event Factuality and Temporal Anchoring , 2015 .

[4]  Josef Steinberger,et al.  Monitoring disaster impact: detecting micro-events and eyewitness reports in mainstream and social media , 2017, ISCRAM.

[5]  Petter Holme,et al.  Predictability of population displacement after the 2010 Haiti earthquake , 2012, Proceedings of the National Academy of Sciences.

[6]  Sarah Vieweg,et al.  Processing Social Media Messages in Mass Emergency , 2014, ACM Comput. Surv..

[7]  Sebastian Hellmann,et al.  N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format , 2014, LREC.

[8]  Fernando Diaz,et al.  CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises , 2014, ICWSM.

[9]  Chris Callison-Burch,et al.  The Gun Violence Database: A new task and data set for NLP , 2016, EMNLP.

[10]  Muhammad Imran,et al.  Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages , 2016, LREC.

[11]  Jisun An,et al.  A First Look at Global News Coverage of Disasters by Using the GDELT Dataset , 2014, SocInfo.

[12]  S. Davis,et al.  Measuring Economic Policy Uncertainty , 2013 .

[13]  Tommaso Caselli,et al.  The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction , 2017, NEWS@ACL.

[14]  Suzanne Franks,et al.  The CARMA Report: Western Media Coverage of Humanitarian Disasters , 2006 .

[15]  Karl Aberer,et al.  Comparing Events Coverage in Online News and Social Media: The Case of Climate Change , 2015, ICWSM.

[16]  Lenka Pitonakova,et al.  Rapid and Near Real-Time Assessments of Population Displacement Using Mobile Phone Data Following Disasters: The 2015 Nepal Earthquake , 2016, PLoS currents.