Early detection of heterogeneous disaster events using social media

This article addresses the problem of detecting crisis‐related messages on social media, in order to improve the situational awareness of emergency services. Previous work focused on developing machine‐learning classifiers restricted to specific disasters, such as storms or wildfires. We investigate for the first time methods to detect such messages where the type of the crisis is not known in advance, that is, the data are highly heterogeneous. Data heterogeneity causes significant difficulties for learning algorithms to generalize and accurately label incoming data. Our main contributions are as follows. First, we evaluate the extent of this problem in the context of disaster management, finding that the performance of traditional learners drops by up to 40% when trained and tested on heterogeneous data vis‐á‐vis homogeneous data. Then, in order to overcome data heterogeneity, we propose a new ensemble learning method, and found this to perform on a par with the Gradient Boosting and AdaBoost ensemble learners. The methods are studied on a benchmark data set comprising 26 disaster events and four classification problems: detection of relevant messages, informative messages, eyewitness reports, and topical classification of messages. Finally, in a case study, we evaluate the proposed methods on a real‐world data set to assess its practical value.

[1]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[2]  Muhammad Imran,et al.  Cross-Language Domain Adaptation for Classifying Crisis-Related Short Messages , 2016, ISCRAM.

[3]  Kalina Bontcheva,et al.  Helping Crisis Responders Find the Informative Needle in the Tweet Haystack , 2018, ISCRAM.

[4]  Calton Pu,et al.  Landslide Detection Service Based on Composition of Physical and Social Information Services , 2014, 2014 IEEE International Conference on Web Services.

[5]  Lisl Zach,et al.  Use of microblogging for collective sense-making during violent crises: A study of three campus shootings , 2012, J. Assoc. Inf. Sci. Technol..

[6]  John Yen,et al.  Classifying text messages for the haiti earthquake , 2011, ISCRAM.

[7]  Carlos Castillo,et al.  AIDR: artificial intelligence for disaster response , 2014, WWW.

[8]  Matthias Hagen,et al.  Twitter Sentiment Detection via Ensemble Classification Using Averaged Confidence Scores , 2015, ECIR.

[9]  Harith Alani,et al.  Crisis Event Extraction Service (CREES) - Automatic Detection and Classification of Crisis-related Content on Social Media , 2018, ISCRAM.

[10]  Aron Culotta,et al.  Tweedr: Mining twitter to inform disaster response , 2014, ISCRAM.

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  Dave Yates,et al.  Emergency knowledge management and social media technologies: A case study of the 2010 Haitian earthquake , 2010, ASIST.

[13]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Alton Yeow-Kuan Chua A tale of two hurricanes: Comparing Katrina and Rita through a knowledge management perspective , 2007, J. Assoc. Inf. Sci. Technol..

[16]  Michel Ballings,et al.  CRM in social media: Predicting increases in Facebook usage frequency , 2015, Eur. J. Oper. Res..

[17]  Walter Daelemans,et al.  Pattern for Python , 2012, J. Mach. Learn. Res..

[18]  Leysia Palen,et al.  Natural Language Processing to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency , 2011, ICWSM.

[19]  Vincent A. Schmidt,et al.  A Semi-automated Display for Geotagged Text , 2015 .

[20]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[21]  Marcel Salathé,et al.  An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages , 2014, J. Biomed. Informatics.

[22]  Cornelia Caragea,et al.  Identifying Informative Messages in Disasters using Convolutional Neural Networks , 2016, ISCRAM.

[23]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[24]  Dave Yates,et al.  Emergency knowledge management and social media technologies: A case study of the 2010 Haitian earthquake , 2011, Int. J. Inf. Manag..

[25]  Hassan Sajjad,et al.  Robust Classification of Crisis-Related Data on Social Networks Using Convolutional Neural Networks , 2017, ICWSM.

[26]  Cheng Zhang,et al.  A System Analytics Framework for Detecting Infrastructure-Related Topics in Disasters Using Social Sensing , 2018, EG-ICE.

[27]  Fernando Diaz,et al.  Extracting information nuggets from disaster- Related messages in social media , 2013, ISCRAM.

[28]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[29]  Cornelia Caragea,et al.  Twitter Mining for Disaster Response: A Domain Adaptation Approach , 2015, ISCRAM.

[30]  Chun Wei Choo,et al.  Early warning information seeking in the 2009 Victorian Bushfires , 2014, J. Assoc. Inf. Sci. Technol..

[31]  Leo L Duan,et al.  Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data , 2016, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[32]  Carlos Castillo,et al.  What to Expect When the Unexpected Happens: Social Media Communications Across Crises , 2015, CSCW.

[33]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .