A polarity analysis framework for Twitter messages

Social media, such as Twitter and Facebook, allow the creation, sharing and exchange of information among people, companies and brands. This information can be used for several purposes, such as to understand consumers and their preferences. In this direction, the sentiment analysis can be used as a feedback mechanism. This analysis corresponds to classifying a text according to the sentiment that the writer intended to transmit. A basic sentiment classifier determines the sentiment polarity (negative, neutral or positive) of a given text at the document, sentence, or feature/aspect level. Advanced types may consider other elements like the emotional state (e.g. angry, sad, happy), affective states (e.g. pleasure and pain), motivational states (e.g. hunger and curiosity), temperaments, among others. In general, there are two main approaches to attribute sentiment to tweets: based on knowledge; or based on machine learning algorithms. In the latter case, the learning algorithm requires a pre-classified data sample to determine the class of new data. Typically, the sample is pre-classified manually, making the process time consuming and reducing its real time applicability for big data. This paper proposes a polarity analysis framework for Twitter messages, which combines both approaches and an automatic contextual module. To assess the performance of the proposed framework, four text datasets from the literature are used. Five different types of classifiers were considered: Naive Bayes (NB); Support Vector Machines (SVM); Decision Trees (J48); and Nearest Neighbors (KNN). The results show that the proposal is a suitable framework to automate the whole polarity analysis process, providing high accuracy levels and low false positive rates.

[1]  David A. Shamma,et al.  Tweet the debates: understanding community annotation of uncollected sources , 2009, WSM@MM.

[2]  Songbo Tan,et al.  A survey on sentiment detection of reviews , 2009, Expert Syst. Appl..

[3]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[4]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[5]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[7]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[8]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[9]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[10]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[11]  Meeyoung Cha,et al.  Emoticon Style: Interpreting Differences in Emoticons Across Cultures , 2013, ICWSM.

[12]  Hajo Hippner,et al.  Text Mining , 2006, Informatik-Spektrum.

[13]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[14]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  Geert-Jan Houben,et al.  Twitcident: fighting fire with information from social web streams , 2012, WWW.

[17]  Grzegorz Kondrak,et al.  A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs , 2008, Canadian Conference on AI.

[18]  Tobias Preis,et al.  Quantifying crowd size with mobile phone and Twitter data , 2015, Royal Society Open Science.

[19]  Christian M. Alis,et al.  Quantifying Regional Differences in the Length of Twitter Messages , 2015, PloS one.

[20]  Matjaz Perc,et al.  The Matthew effect in empirical data , 2014, Journal of The Royal Society Interface.

[21]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[22]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[23]  Aliza Sarlan,et al.  Twitter sentiment analysis , 2014, Proceedings of the 6th International Conference on Information Technology and Multimedia.

[24]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[25]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[26]  Catherine Blake,et al.  Text mining , 2011, Annu. Rev. Inf. Sci. Technol..

[27]  David Watson,et al.  The PANAS-X manual for the positive and negative affect schedule , 1994 .

[28]  E J Rayfield,et al.  What makes an accurate and reliable subject-specific finite element model? A case study of an elephant femur , 2014, Journal of The Royal Society Interface.

[29]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[30]  Matjaz Perc,et al.  Evolution of the most common English words and phrases over the centuries , 2012, Journal of The Royal Society Interface.

[31]  Akshi Kumar,et al.  Sentiment Analysis: A Perspective on its Past, Present and Future , 2012 .

[32]  Hiroshi Nakagawa,et al.  ITC-UT: Tweet Categorization by Query Categorization for On-line Reputation Management , 2010, CLEF.

[33]  Normando Rodrigues Souza Filho MONITORAMENTO DAS REDES SOCIAIS COMO FORMA DE RELACIONAMENTO COM O CONSUMIDOR. O QUE AS EMPRESAS ESTÃO FAZENDO , 2011 .

[34]  Jing Hu,et al.  Culturomics meets random fractal theory: insights into long-range correlations of social and natural phenomena over the past two centuries , 2012, Journal of The Royal Society Interface.

[35]  Meera Narvekar,et al.  A review of techniques for sentiment analysis Of Twitter data , 2014, 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT).

[36]  Загоровская Ольга Владимировна,et al.  Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count , 2015 .

[37]  John Hughes,et al.  AMALGAM: Automatic Mapping Among Lexico-Grammatical Annotation Models , 1994 .

[38]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[39]  E. Cambria,et al.  Sentic Computing , 2015, Cognitive Computation.

[40]  Estevam R. Hruschka,et al.  Tweet sentiment analysis with classifier ensembles , 2014, Decis. Support Syst..

[41]  Claire Cardie,et al.  Multi-Level Structured Models for Document-Level Sentiment Classification , 2010, EMNLP.

[42]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[43]  Leandro Matioli Santos PROTÓTIPO PARA MINERAÇÃO DE OPINIÃO EM REDES SOCIAIS: ESTUDO DE CASOS SELECIONADOS USANDO O TWITTER , 2015 .

[44]  Usman Qamar,et al.  TOM: Twitter opinion mining framework using hybrid classification scheme , 2014, Decis. Support Syst..

[45]  Fabrício Benevenuto,et al.  Métodos para Análise de Sentimentos no Twitter , 2013 .

[46]  Themis Palpanas,et al.  Survey on mining subjective data on the web , 2011, Data Mining and Knowledge Discovery.

[47]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[48]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[49]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[50]  Vasudeva Varma,et al.  Mining Sentiments from Tweets , 2012, WASSA@ACL.

[51]  Ian Witten,et al.  Data Mining , 2000 .

[52]  Michelle R. Guy,et al.  Twitter earthquake detection: earthquake monitoring in a social world , 2012 .

[53]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[54]  Jordan L. Boyd-Graber,et al.  Grammatical structures for word-level sentiment detection , 2012, NAACL.

[55]  R HruschkaEduardo,et al.  Tweet sentiment analysis with classifier ensembles , 2014 .

[56]  Gavin J. P. Naylor,et al.  Rediscovery of the Threatened River Sharks, Glyphis garricki and G. glyphis, in Papua New Guinea , 2015, PloS one.

[57]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.