Evaluating the Effectiveness of Hashtags as Predictors of the Sentiment of Tweets

Recently, there has been growing research interest in the sentiment analysis of tweets. However, there is still a need to examine the contribution of Twitter-specific features to this task. One such feature is hashtags, which are user-defined topics. In our study, we compare the performance of sentiment and non-sentiment hashtags in classifying tweets as positive or negative. By combining subjective words from different lexical resources, we achieve accuracy scores of 83.58 % and 83.83 % in identifying sentiment hashtags and non-sentiment hashtags, respectively. Furthermore, our accuracy scores surpass those scores obtained using models that apply a single lexical resource. We apply derived properties of sentiment and non-sentiment hashtags, including their sentiment polarity to classify tweets. Our best classification models achieve accuracy scores of 81.14 % and 86.07 % using sentiment hashtags and non-sentiment hashtags, respectively. Additionally, our models perform comparably to supervised machine learning algorithms, and outperform a scoring algorithm developed in a previous study.

[1]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[2]  Vasudeva Varma,et al.  Mining Sentiments from Tweets , 2012, WASSA@ACL.

[3]  Lijuan Wang,et al.  The Role of Pre-processing in Twitter Sentiment Analysis , 2014, ICIC.

[4]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[5]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[6]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[7]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[8]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[9]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[10]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[11]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[12]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[13]  Mohammed J. Zaki,et al.  Characterizing the effectiveness of twitter hashtags to detect and track online population sentiment , 2012, CHI Extended Abstracts.

[14]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[15]  Amit P. Sheth,et al.  Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[16]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[17]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[18]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[19]  Nibir Nayan Bora,et al.  Summarizing Public Opinions in Tweets , 2012 .

[20]  Johan Bos,et al.  *SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) , 2012 .

[21]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[22]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[23]  Vandana Jagtap,et al.  Analysis of different approaches to Sentence-Level Sentiment Classification , 2013 .

[24]  Fermín L. Cruz,et al.  Automatic Expansion of Feature-Level Opinion Lexicons , 2011, WASSA@ACL.

[25]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[26]  Clement Levallois,et al.  Umigon: sentiment analysis for tweets based on terms lists and heuristics , 2013, *SEMEVAL.

[27]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[28]  Akshi Kumar,et al.  Sentiment Analysis on Twitter , 2012 .

[29]  Marcelo Mendoza,et al.  Combining strengths, emotions and polarities for boosting Twitter sentiment analysis , 2013, WISDOM '13.

[30]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[31]  Jennifer Foster,et al.  Sentiment Analysis of Political Tweets: Towards an Accurate Classifier , 2013 .

[32]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[33]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[34]  John Elder,et al.  Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications , 2012 .

[35]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[36]  Minyi Guo,et al.  Emoticon Smoothed Language Models for Twitter Sentiment Analysis , 2012, AAAI.

[37]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[38]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[39]  Xueqi Cheng,et al.  Adaptive co-training SVM for sentiment classification on tweets , 2013, CIKM.

[40]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[41]  Vineet Yadav,et al.  Serendio: Simple and Practical lexicon based approach to Sentiment Analysis , 2013, *SEMEVAL.

[42]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.