A rapid-prototyping framework for extracting small-scale incident-related information in microblogs: Application of multi-label classification on tweets

Small scale-incidents such as car crashes or fires occur with high frequency and in sum involve more people and consume more money than large and infrequent incidents. Therefore, the support of small-scale incident management is of high importance.Microblogs are an important source of information to support incident management as important situational information is shared, both by citizens and official sources. While microblogs are already used to address large-scale incidents detecting small-scale incident-related information was not satisfyingly possible so far.In this paper we investigate small-scale incident reporting behavior with microblogs. Based on our findings, we present an easily extensible rapid prototyping framework for information extraction of incident-related tweets. The framework enables the precise identification and extraction of information relevant for emergency management. We evaluate the rapid prototyping capabilities and usefulness of the framework by implementing the multi-label classification of tweets related to small-scale incidents. An evaluation shows that our approach is applicable for detecting multiple labels with an match rate of 84.35%.

[1]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[2]  Joemon M. Jose,et al.  Building a large-scale corpus for evaluating event detection on twitter , 2013, CIKM.

[3]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[4]  Axel Schulz,et al.  STATSREP-ML: Statistical Evaluation & Reporting Framework for Machine Learning Results , 2014 .

[5]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[6]  Rizal Setya Perdana What is Twitter , 2013 .

[7]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[8]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[9]  John Z. Zhang,et al.  Enhancing multi-label music genre classification through ensemble techniques , 2011, SIGIR.

[10]  Iryna Gurevych,et al.  A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles , 2012, COLING.

[11]  Rosaldo J. F. Rossetti,et al.  Mobility Network Evaluation in the User Perspective: Real-Time Sensing of Traffic Information in Twitter Messages , 2010 .

[12]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[13]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[14]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[15]  Wasan Pattara-Atikom,et al.  Social-based traffic information extraction and classification , 2011, 2011 11th International Conference on ITS Telecommunications.

[16]  Huan Liu,et al.  Mining Social Media: A Brief Introduction , 2012 .

[17]  John Yen,et al.  Seeking the trustworthy tweet: Can microblogged data fit the information needs of disaster response and humanitarian relief organizations , 2011, ISCRAM.

[18]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[19]  Robert Power,et al.  A Case Study for Monitoring Fires with Twitter , 2015, ISCRAM.

[20]  Martin Atzmüller Mining Social Media , 2012, Informatik-Spektrum.

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[23]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[24]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[25]  Axel Schulz,et al.  Evaluating Multi-label Classification of Incident-related Tweet , 2014, #MSM.

[26]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[27]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[28]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[29]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[30]  Cecile Paris,et al.  Classifying microblogs for disasters , 2013, ADCS.

[31]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[32]  Mica R. Endsley,et al.  Design and Evaluation for Situation Awareness Enhancement , 1988 .

[33]  Axel Schulz,et al.  I See a Car Crash: Real-Time Detection of Small Scale Incidents in Microblogs , 2013, ESWC.

[34]  John M. Carroll,et al.  Community incident chatter: Informing local incidents by aggregating local news and social media content , 2014, ISCRAM.

[35]  Grigorios Tsoumakas,et al.  Evaluating Feature Selection Methods for Multi-Label Text Classication , 2013, BioASQ@CLEF.

[36]  David Ratcliffe,et al.  Finding Fires with Twitter , 2013, ALTA.

[37]  Max L. Wilson,et al.  Searching Twitter: Separating the Tweet from the Chaff , 2011, ICWSM.

[38]  Grigorios Tsoumakas,et al.  Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[39]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[40]  Newton Spolaôr,et al.  ReliefF for Multi-label Feature Selection , 2013, 2013 Brazilian Conference on Intelligent Systems.

[41]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[42]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[43]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[44]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[45]  Mor Naaman,et al.  Unfolding the event landscape on twitter: classification and exploration of user categories , 2012, CSCW '12.

[46]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[47]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[48]  Yiming Yang,et al.  Learning approaches for detecting and tracking news events , 1999, IEEE Intell. Syst..

[49]  Hyung-Jeong Yang,et al.  Exploiting Patterns for Handling Incomplete Coevolving EEG Time Series , 2013 .

[50]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[51]  Axel Schulz,et al.  Crisis information management in the Web 3.0 age , 2012, ISCRAM.

[52]  Jiun-Hung Chen,et al.  A multi-label classification based approach for sentiment classification , 2015, Expert Syst. Appl..

[53]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[54]  Cristina V. Lopes,et al.  Multi-Label Classification of Short Text: A Study on Wikipedia Barnstars , 2011, Analyzing Microtext.

[55]  Axel Schulz,et al.  Small-Scale Incident Detection based on Microposts , 2015, HT.

[56]  Maximilian Walther,et al.  Geo-spatial Event Detection in the Twitter Stream , 2013, ECIR.

[57]  Philip J. Hayes,et al.  CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[58]  Hsia-Ching Chang,et al.  A new perspective on Twitter hashtag use: Diffusion of innovation theory , 2010, ASIST.

[59]  Torsten Hothorn,et al.  Exploratory and Inferential Analysis of Benchmark Experiments , 2008 .

[60]  Gautam Shroff,et al.  Catching the Long-Tail: Extracting Local News Events from Twitter , 2012, ICWSM.

[61]  Yiannis Kompatsiaris,et al.  Sensing Trending Topics in Twitter , 2013, IEEE Transactions on Multimedia.

[62]  Tom M. Mitchell,et al.  Weakly Supervised Extraction of Computer Security Events from Twitter , 2015, WWW.

[63]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[64]  Michael Gertz,et al.  Multilingual and cross-domain temporal tagging , 2012, Language Resources and Evaluation.

[65]  Johannes Fürnkranz,et al.  Event-Based Clustering for Reducing Labeling Costs of Event-related Microposts , 2015, ICWSM.

[66]  Eckehard G. Steinbach,et al.  Fully Automatic and Frame-Accurate Video Synchronization Using Bitrate Sequences , 2013, IEEE Transactions on Multimedia.

[67]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[68]  Eneldo Loza Mencía,et al.  Stacking Label Features for Learning Multilabel Rules , 2014, Discovery Science.

[69]  Liang Zhao,et al.  STED: semi-supervised targeted-interest event detectionin in twitter , 2013, KDD.

[70]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[71]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..