Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages

Microblogging platforms such as Twitter provide active communication channels during mass convergence and emergency events such as earthquakes, typhoons. During the sudden onset of a crisis situation, affected people post useful information on Twitter that can be used for situational awareness and other humanitarian disaster response efforts, if processed timely and effectively. Processing social media information pose multiple challenges such as parsing noisy, brief and informal messages, learning information categories from the incoming stream of messages and classifying them into different classes among others. One of the basic necessities of many of these tasks is the availability of data, in particular human-annotated data. In this paper, we present human-annotated Twitter corpora collected during 19 different crises that took place between 2013 and 2015. To demonstrate the utility of the annotations, we train machine learning classifiers. Moreover, we publish first largest word2vec word embeddings trained on 52 million crisis-related tweets. To deal with tweets language issues, we present human-annotated normalized lexical resources for different lexical variations.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Jie Yin,et al.  Emergency situation awareness from twitter for crisis management , 2012, WWW.

[3]  Martha Palmer,et al.  Twitter in mass emergency: what NLP techniques can contribute , 2010, HLT-NAACL 2010.

[4]  Sarah Vieweg,et al.  Processing Social Media Messages in Mass Emergency , 2014, ACM Comput. Surv..

[5]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[6]  Sarah Vieweg,et al.  Situational Awareness in Mass Emergency: A Behavioral and Linguistic Analysis of Microblogged Communications , 2012 .

[7]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[8]  Leysia Palen,et al.  Twitter adoption and use in mass convergence and emergency events , 2009 .

[9]  Kees Nieuwenhuis,et al.  Information Systems for Crisis Response and Management , 2007, Mobile Response.

[10]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[11]  Carlos Castillo,et al.  AIDR: artificial intelligence for disaster response , 2014, WWW.

[12]  Muhammad Imran,et al.  Enabling Digital Health by Automatic Classification of Short Messages , 2016, Digital Health.

[13]  Kalina Bontcheva,et al.  TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text , 2013, RANLP.

[14]  Fernando Diaz,et al.  Extracting information nuggets from disaster- Related messages in social media , 2013, ISCRAM.

[15]  Muhammad Imran,et al.  Integrating Social Media Communications into the Rapid Assessment of Sudden Onset Disasters , 2014, SocInfo.

[16]  Timothy Baldwin,et al.  Lexical normalization for social media text , 2013, TIST.

[17]  Carlos Castillo,et al.  What to Expect When the Unexpected Happens: Social Media Communications Across Crises , 2015, CSCW.

[18]  Josef van Genabith,et al.  #hardtoparse: POS Tagging and Parsing the Twitterverse , 2011, Analyzing Microtext.

[19]  Fernando Diaz,et al.  Emergency-relief coordination on social media: Automatically matching resource requests and offers , 2013, First Monday.