Evaluating Multi-label Classification of Incident-related Tweet

Microblogs are an important source of information in emergency management as lots of situational information is shared, both by citizens and ocial sources. It has been shown that incident-related information can be identified in the huge amount of available information using machine learning. Nevertheless, the currently used classification techniques only assign a single label to a micropost, resulting in a loss of important information that would be valuable for crisis management. With this paper we contribute the first in-depth analysis of multi-label classification of incident-related tweets. We present an approach assigning multiple labels to these messages, providing additional information about the situation at-hand. An evaluation shows that multi-label classification is applicable for detecting multiple labels with an exact match of 84.35%. Thus, it is a valuable means for classifying incident-related tweets. Furthermore, we show that correlation between labels can be taken into account for these kinds of classification tasks.

[1]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[2]  Rebecca Goolsby,et al.  Lifting Elephants: Twitter and Blogging in Global Perspective , 2009 .

[3]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[4]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[5]  Luis Alfonso Ureña López,et al.  Adaptive Selection of Base Classifiers in One-Against-All Learning for Large Multi-labeled Collections , 2004, EsTAL.

[6]  Grigorios Tsoumakas,et al.  Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[7]  John Z. Zhang,et al.  Enhancing multi-label music genre classification through ensemble techniques , 2011, SIGIR.

[8]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[9]  Axel Schulz,et al.  A fine-grained sentiment analysis approach for detecting crisis related microposts , 2013, ISCRAM.

[10]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[11]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[12]  Johannes Fürnkranz,et al.  Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain , 2008, ECML/PKDD.

[13]  Timothy N. Rubin,et al.  Statistical topic models for multi-label document classification , 2011, Machine Learning.

[14]  Cristina V. Lopes,et al.  Multi-Label Classification of Short Text: A Study on Wikipedia Barnstars , 2011, Analyzing Microtext.

[15]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[16]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[17]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[18]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[19]  Axel Schulz,et al.  I See a Car Crash: Real-Time Detection of Small Scale Incidents in Microblogs , 2013, ESWC.

[20]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[21]  Philip J. Hayes,et al.  CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[22]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[23]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[24]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[25]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[26]  Iryna Gurevych,et al.  A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles , 2012, COLING.