Towards a Data-driven Approach to Identify Crisis-Related Topics in Social Media Streams

While categorizing any type of user-generated content online is a challenging problem, categorizing social media messages during a crisis situation adds an additional layer of complexity, due to the volume and variability of information, and to the fact that these messages must be classified as soon as they arrive. Current approaches involve the use of automatic classification, human classification, or a mixture of both. In these types of approaches, there are several reasons to keep the number of information categories small and updated, which we examine in this article. This means at the onset of a crisis an expert must select a handful of information categories into which information will be categorized. The next step, as the crisis unfolds, is to dynamically change the initial set as new information is posted online. In this paper, we propose an effective way to dynamically extract emerging, potentially interesting, new categories from social media data.

[1]  Carlos Castillo,et al.  What to Expect When the Unexpected Happens: Social Media Communications Across Crises , 2015, CSCW.

[2]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[3]  Carlos Castillo,et al.  AIDR: artificial intelligence for disaster response , 2014, WWW.

[4]  Muhammad Imran,et al.  Engineering Crowdsourced Stream Processing Systems , 2013, ArXiv.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Geoff Holmes,et al.  MOA Concept Drift Active Learning Strategies for Streaming Data , 2011, WAPA.

[7]  Kirill Kireyev Applications of Topics Models to Analysis of Disaster-Related Twitter Data , 2009 .

[8]  Leysia Palen,et al.  Chatter on the red: what hazards threat reveals about the social life of microblogged information , 2010, CSCW '10.

[9]  Aron Culotta,et al.  Tweedr: Mining twitter to inform disaster response , 2014, ISCRAM.

[10]  Brooke Fisher Liu,et al.  Social media use during disasters: a review of the knowledge base and gaps. , 2012 .

[11]  Sihem Amer-Yahia,et al.  Tweet4act: Using incident-specific profiles for classifying crisis-related messages , 2013, ISCRAM.

[12]  Sarah Vieweg,et al.  Processing Social Media Messages in Mass Emergency , 2014, ACM Comput. Surv..

[13]  Fernando Diaz,et al.  Extracting information nuggets from disaster- Related messages in social media , 2013, ISCRAM.

[14]  Leysia Palen,et al.  Twitter adoption and use in mass convergence and emergency events , 2009 .

[15]  A. Shapiro,et al.  National Consortium for the Study of Terrorism and Responses to Terrorism , 2010 .

[16]  A. Bruns,et al.  #qldfloods and @QPSMedia: Crisis Communication on Twitter in the 2011 South East Queensland Floods , 2012 .

[17]  Muhammad Imran,et al.  Coordinating human and machine intelligence to classify microblog communications in crises , 2014, ISCRAM.

[18]  Geoff Holmes,et al.  Active Learning with Evolving Streaming Data , 2011, ECML/PKDD.

[19]  Christophe G. Giraud-Carrier,et al.  Identifying Health-Related Topics on Twitter - An Exploration of Tobacco-Related Tweets as a Test Topic , 2011, SBP.

[20]  Sarah Vieweg,et al.  Situational Awareness in Mass Emergency: A Behavioral and Linguistic Analysis of Microblogged Communications , 2012 .