Twitter Sentiment Analysis

This paper covers the two approaches for sentiment analysis: i) lexicon based method; ii) machine learning method. We describe several techniques to implement these approaches and discuss how they can be adopted for sentiment classification of Twitter messages. We present a comparative study of different lexicon combinations and show that enhancing sentiment lexicons with emoticons, abbreviations and social-media slang expressions increases the accuracy of lexicon-based classification for Twitter. We discuss the importance of feature generation and feature selection processes for machine learning sentiment classification. To quantify the performance of the main sentiment analysis methods over Twitter we run these algorithms on a benchmark Twitter dataset from the SemEval-2013 competition, task 2-B. The results show that machine learning method based on SVM and Naive Bayes classifiers outperforms the lexicon method. We present a new ensemble method that uses a lexicon based sentiment score as input feature for the machine learning approach. The combined method proved to produce more precise classifications. We also show that employing a cost-sensitive classifier for highly unbalanced datasets yields an improvement of sentiment classification performance up to 7%.

[1]  Yuhai Wu,et al.  Statistical Learning Theory , 2021, Technometrics.

[2]  Philip C. Treleaven,et al.  Twitter Sentiment Analysis Applied to Finance: A Case Study in the Retail Industry , 2015, ArXiv.

[3]  Elisabetta Fersini,et al.  Sentiment analysis: Bayesian Ensemble Learning , 2014, Decis. Support Syst..

[4]  Nikolaos Korfiatis,et al.  Mining of Massive Datasets , 2014 .

[5]  Alexandre Plastino,et al.  A Statistical and Evolutionary Approach to Sentiment Analysis , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[6]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[7]  Michael J. Cafarella,et al.  Using Social Media to Measure Labor Market Flows , 2014 .

[8]  Kristina Lerman,et al.  Tripartite graph clustering for dynamic sentiment analysis on social media , 2014, SIGMOD Conference.

[9]  Elisabetta Fersini,et al.  Enhance User-Level Sentiment Analysis on Microblogs with Approval Relations , 2013, AI*IA.

[10]  David Zimbra,et al.  Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network , 2013, Expert Syst. Appl..

[11]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[12]  Jiebo Luo,et al.  Towards social imagematics: sentiment analysis in social multimedia , 2013, MDMKDD '13.

[13]  Veselin Stoyanov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[14]  Vivek Narayanan,et al.  Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model , 2013, IDEAL.

[15]  Uzay Kaymak,et al.  Exploiting emoticons in sentiment analysis , 2013, SAC '13.

[16]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[17]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[18]  Ke Xu,et al.  MoodLens: an emoticon-based sentiment analysis system for chinese tweets , 2012, KDD.

[19]  Minyi Guo,et al.  Emoticon Smoothed Language Models for Twitter Sentiment Analysis , 2012, AAAI.

[20]  George Papadakis,et al.  Content vs. context for sentiment analysis: a comparative analysis over microblogs , 2012, HT '12.

[21]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[22]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[23]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[24]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[25]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[26]  Isaac G. Councill,et al.  What's great and what's not: learning to classify the scope of negation for improved sentiment analysis , 2010, NeSp-NLP@ACL.

[27]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[28]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[29]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[30]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[31]  Masaru Kitsuregawa,et al.  Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents , 2007, EMNLP.

[32]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[33]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[34]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[35]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[36]  Jörg Kindermann,et al.  Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.

[37]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[38]  FormanGeorge An extensive empirical study of feature selection metrics for text classification , 2003 .

[39]  Janyce Wiebe,et al.  Learning to Disambiguate Potentially Subjective Expressions , 2002, CoNLL.

[40]  Satoshi Morinaga,et al.  Mining product reputations on the Web , 2002, KDD.

[41]  Yuan-Fang Wang,et al.  The use of bigrams to enhance text categorization , 2002, Inf. Process. Manag..

[42]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[43]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[44]  Adam Kowalczyk,et al.  Second Order Features for Maximising Text Classification Performance , 2001, ECML.

[45]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[46]  Stan Matwin,et al.  A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .

[47]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[48]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[49]  Roberto Basili,et al.  Language sensitive text classification , 2000, RIAO.

[50]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[51]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[52]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[53]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[54]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[55]  Paul S. Jacobs,et al.  Joining Statistics with NLP for Text Categorization , 1992, ANLP.

[56]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[57]  Philip J. Stone,et al.  A computer approach to content analysis: studies using the General Inquirer system , 1963, AFIPS Spring Joint Computing Conference.

[58]  Elisabetta Fersini,et al.  Expressive signals in social media languages to improve polarity detection , 2016, Inf. Process. Manag..

[59]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[60]  Elisabetta Fersini,et al.  Enhance Polarity Classification on Social Media through Sentiment-based Feature Expansion , 2013, WOA@AI*IA.

[61]  L. Ladha,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[62]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[63]  Bruno Ohana,et al.  Sentiment Classification of Reviews Using SentiWordNet , 2009 .

[64]  S. Marsland Machine Learning - An Algorithmic Perspective , 2009, Chapman and Hall / CRC machine learning and pattern recognition series.

[65]  Victor S. Sheng,et al.  Cost-Sensitive Learning and the Class Imbalance Problem , 2008 .

[66]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[67]  Karo Moilanen,et al.  Sentiment Composition , 2007 .

[68]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[69]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[70]  S. Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[71]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[72]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[73]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[74]  John C. Platt,et al.  Ììì Öûûò Ë Blockinöö Óóóòòòö Áòøøöòòøøóòòð , 1992 .

[75]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[76]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[77]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.