Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study

Twitter is a popular micro-blogging platform that has obtained a lot of reputation in the last few years and offer a diverse source of real-time information about different events, often during mass crises. During any crisis, it is necessary to filter through a huge amount of tweets rapidly to extract incident related information. Different machine learning (ML) algorithms have been used to classify crisis related tweets from non crisis-related ones, and has great importance in constructing an emergency management framework. These algorithms rely heavily on datasets used, and also different hyper-parameters which need to be tuned to provide better performance. Hence, this paper focuses on: (1) different Natural Language Processing (NLP) techniques to make tweets suitable for applying ML algorithms, (2) hyper-parameter tuning of neural networks when used as classifiers on short messages, tweets, (3) comparative analysis of different state-of-the-art ML algorithms (classifiers) which can be applied to categorize crisis-related tweets with a higher accuracy. The experiments have been done on six different crisis related datasets, each approximately consisting of 10,000 tweets. Analysis have shown that Support Vector Machines and Logistic Regression have performed significantly well than Naive Bayes and Neural Networks (NN) with a very high accuracy of 96% (variations seen with different dataset though). With proper hyper-parameter tuning, NN have also showed promising results.

[1]  Fernando Diaz,et al.  CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises , 2014, ICWSM.

[2]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[3]  Bo Zhao,et al.  PET: a statistical model for popular events tracking in social communities , 2010, KDD.

[4]  Carlos Castillo,et al.  What to Expect When the Unexpected Happens: Social Media Communications Across Crises , 2015, CSCW.

[5]  Jochen L. Leidner,et al.  Detecting geographical references in the form of place names and associated spatial natural language , 2011, SIGSPACIAL.

[6]  Jeannie A. Stamberger,et al.  Tweak the tweet: Leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting , 2010, ISCRAM.

[7]  Jie Yin,et al.  Using Social Media to Enhance Emergency Situation Awareness , 2012, IEEE Intelligent Systems.

[8]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[9]  Michael Gertz,et al.  Spatio-temporal characteristics of bursty words in Twitter streams , 2013, SIGSPATIAL/GIS.

[10]  Judith Gelernter,et al.  Geocoding location expressions in Twitter messages: A preference learning method , 2014, J. Spatial Inf. Sci..

[11]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[12]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[13]  Fredrik Johansson,et al.  Learning to classify emotional content in crisis-related tweets , 2013, 2013 IEEE International Conference on Intelligence and Security Informatics.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[16]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[17]  Yutaka Matsuo,et al.  Semantic Twitter: Analyzing Tweets for Real-Time Event Notification , 2008, BlogTalk.

[18]  Vincent D. Blondel,et al.  A Place-Focused Model for Social Networks in Cities , 2013, 2013 International Conference on Social Computing.

[19]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[20]  Sung-Hyon Myaeng,et al.  Topic-based place semantics discovered from microblogging text messages , 2014, WWW '14 Companion.

[21]  Craig A. Knoblock,et al.  A Survey of Digital Map Processing Techniques , 2014, ACM Comput. Surv..

[22]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[23]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[24]  Sarah Vieweg,et al.  Processing Social Media Messages in Mass Emergency , 2014, ACM Comput. Surv..

[25]  Heng Zhang,et al.  Improving short text classification by learning vector representations of both words and hidden topics , 2016, Knowl. Based Syst..

[26]  Judith Gelernter,et al.  Geo‐parsing Messages from Microtext , 2011, Trans. GIS.

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  Andrew W. Moore,et al.  Making logistic regression a core data mining tool with TR-IRLS , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[29]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[30]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[31]  Gerhard Weikum,et al.  The Bag-of-Opinions Method for Review Rating Prediction from Sparse Text Patterns , 2010, COLING.

[32]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[33]  Karl-Michael Schneider,et al.  A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.

[34]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[35]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[36]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[37]  Xing Hu,et al.  Product aspect identification: Analyzing role of different classifiers , 2014, 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[38]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[39]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.