Multi-class Twitter sentiment classification with emojis

Purpose Recently, various Twitter Sentiment Analysis (TSA) techniques have been developed, but little has paid attention to the microblogging feature – emojis, and few works have been conducted on the multi-class sentiment analysis of tweets. The purpose of this paper is to consider the popularity of emojis on Twitter and investigate the feasibility of an emoji training heuristic for multi-class sentiment classification of tweets. Tweets from the “2016 Orlando nightclub shooting” were used as a source of study. Besides, this study also aims to demonstrate how mapping can contribute to interpreting sentiments. Design/methodology/approach The authors presented a methodological framework to collect, pre-process, analyse and map public Twitter postings related to the shooting. The authors designed and implemented an emoji training heuristic, which automatically prepares the training data set, a feature needed in Big Data research. The authors improved upon the previous framework by advancing the pre-processing techniques, enhancing feature engineering and optimising the classification models. The authors constructed the sentiment model with a logistic regression classifier and selected features. Finally, the authors presented how to visualise citizen sentiments on maps dynamically using Mapbox. Findings The sentiment model constructed with the automatically annotated training sets using an emoji approach and selected features performs well in classifying tweets into five different sentiment classes, with a macro-averaged F-measure of 0.635, a macro-averaged accuracy of 0.689 and the MAEM of 0.530. Compared to those experimental results in related works, the results are satisfactory, indicating the model is effective and the proposed emoji training heuristic is useful and feasible in multi-class TSA. The maps authors created, provide a much easier-to-understand visual representation of the data, and make it more efficient to monitor citizen sentiments and distributions. Originality/value This work appears to be the first to conduct multi-class sentiment classification on Twitter with automatic annotation of training sets using emojis. Little attention has been paid to applying TSA to monitor the public’s attitudes towards terror attacks and country’s gun policies, the authors consider this work to be a pioneering work. Besides, the authors have introduced a new data set of 2016 Orlando Shooting tweets, which will be made available for other researchers to mine the public’s political opinions about gun policies.

[1]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[2]  Jerzy Stefanowski,et al.  PUT at SemEval-2016 Task 4: The ABC of Twitter Sentiment Analysis , 2016, *SEMEVAL.

[3]  Petra Kralj Novak,et al.  Sentiment of Emojis , 2015, PloS one.

[4]  S. Herring,et al.  Functions of the Nonverbal in CMC: Emoticons and Illocutionary Force , 2010 .

[5]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[6]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[7]  Serkan Ayvaz,et al.  The Effects of Emoji in Sentiment Analysis , 2017 .

[8]  Vasudeva Varma,et al.  IIIT-H at SemEval 2015: Twitter Sentiment Analysis – The Good, the Bad and the Neutral! , 2015, *SEMEVAL.

[9]  Georgios Balikas,et al.  TwiSE at SemEval-2016 Task 4: Twitter Sentiment Classification , 2016, *SEMEVAL.

[10]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[11]  Mohamed Farouk Abdel Hady,et al.  Feature Selection for Twitter Sentiment Analysis: An Experimental Study , 2015, CICLing.

[12]  Preslav Nakov,et al.  SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[13]  John G. Breslin,et al.  INSIGHT-1 at SemEval-2016 Task 4: Convolutional Neural Networks for Sentiment Classification and Quantification , 2016, SemEval@NAACL-HLT.

[14]  Philip Treleaven,et al.  Twitter Sentiment Analysis , 2015, ArXiv.

[15]  Yoram M. Kalman,et al.  Letter repetitions in computer-mediated communication: A unique link between spoken and online language , 2014, Comput. Hum. Behav..

[16]  Tomoaki Ohtsuki,et al.  A Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter , 2017, IEEE Access.

[17]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[18]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[19]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[20]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[21]  Eugene Ch'ng,et al.  The new eye of smart city: Novel citizen Sentiment Analysis in Twitter , 2016, 2016 International Conference on Audio, Language and Image Processing (ICALIP).

[22]  Marc Cheong,et al.  A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter , 2011, Inf. Syst. Frontiers.

[23]  Kiyoaki Shirai,et al.  Topic Modeling based Sentiment Analysis on Social Media for Stock Market Prediction , 2015, ACL.

[24]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[25]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[26]  Fabio Crestani,et al.  Like It or Not , 2016, ACM Comput. Surv..

[27]  Jacob Eisenstein,et al.  Emoticons vs. Emojis on Twitter: A Causal Inference Approach , 2015, ArXiv.

[28]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[29]  Mario Alvarez-Jimenez,et al.  Social Functioning Trajectories of Young First-Episode Psychosis Patients with and without Cannabis Misuse: A 30-Month Follow-Up Study , 2015, PloS one.

[30]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[31]  Miguel A. Alonso,et al.  LyS at SemEval-2016 Task 4: Exploiting Neural Activation Values for Twitter Sentiment Classification and Quantification , 2016, *SEMEVAL.

[32]  Eugene Ch'ng,et al.  The Value of Using Big Data Technologies in Computational Social Science , 2014, BigDataScience '14.

[33]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.