Twitter Sentiment Analysis

This paper covers the two approaches for sentiment analysis: i) lexicon based method; ii) machine learning method. We describe several techniques to implement these approaches and discuss how they can be adopted for sentiment classification of Twitter messages. We present a comparative study of different lexicon combinations and show that enhancing sentiment lexicons with emoticons, abbreviations and social-media slang expressions increases the accuracy of lexicon-based classification for Twitter. We discuss the importance of feature generation and feature selection processes for machine learning sentiment classification. To quantify the performance of the main sentiment analysis methods over Twitter we run these algorithms on a benchmark Twitter dataset from the SemEval-2013 competition, task 2-B. The results show that machine learning method based on SVM and Naive Bayes classifiers outperforms the lexicon method. We present a new ensemble method that uses a lexicon based sentiment score as input feature for the machine learning approach. The combined method proved to produce more precise classifications. We also show that employing a cost-sensitive classifier for highly unbalanced datasets yields an improvement of sentiment classification performance up to 7%.

[1]  Roberto Basili,et al.  Language sensitive text classification , 2000, RIAO.

[2]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[3]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[4]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[5]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[6]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[7]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[8]  Elisabetta Fersini,et al.  Enhance User-Level Sentiment Analysis on Microblogs with Approval Relations , 2013, AI*IA.

[9]  Uzay Kaymak,et al.  Exploiting emoticons in sentiment analysis , 2013, SAC '13.

[10]  Alexandre Plastino,et al.  A Statistical and Evolutionary Approach to Sentiment Analysis , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[11]  Stephen R. Marsland,et al.  Machine Learning - An Algorithmic Perspective , 2009, Chapman and Hall / CRC machine learning and pattern recognition series.

[12]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[13]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[14]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[15]  Adam Kowalczyk,et al.  Second Order Features for Maximising Text Classification Performance , 2001, ECML.

[16]  Satoshi Morinaga,et al.  Mining product reputations on the Web , 2002, KDD.

[17]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[18]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[19]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[20]  Philip J. Stone,et al.  A computer approach to content analysis: studies using the General Inquirer system , 1963, AFIPS Spring Joint Computing Conference.

[21]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[22]  Elisabetta Fersini,et al.  Expressive signals in social media languages to improve polarity detection , 2016, Inf. Process. Manag..

[23]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[24]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[25]  David Zimbra,et al.  Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network , 2013, Expert Syst. Appl..

[26]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[27]  Sri Ramakrishna,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[28]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[29]  Bruno Ohana,et al.  Sentiment Classification of Reviews Using SentiWordNet , 2009 .

[30]  Elisabetta Fersini,et al.  Enhance Polarity Classification on Social Media through Sentiment-based Feature Expansion , 2013, WOA@AI*IA.

[31]  Stan Matwin,et al.  A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .

[32]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[33]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[34]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[35]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[36]  Vivek Narayanan,et al.  Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model , 2013, IDEAL.

[37]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[38]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[39]  Isaac G. Councill,et al.  What's great and what's not: learning to classify the scope of negation for improved sentiment analysis , 2010, NeSp-NLP@ACL.

[40]  Victor S. Sheng,et al.  Cost-Sensitive Learning and the Class Imbalance Problem , 2008 .

[41]  L. Ladha,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[42]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[43]  Janyce Wiebe,et al.  Learning to Disambiguate Potentially Subjective Expressions , 2002, CoNLL.

[44]  Kristina Lerman,et al.  Tripartite graph clustering for dynamic sentiment analysis on social media , 2014, SIGMOD Conference.

[45]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[46]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[47]  Minyi Guo,et al.  Emoticon Smoothed Language Models for Twitter Sentiment Analysis , 2012, AAAI.

[48]  Michael J. Cafarella,et al.  Using Social Media to Measure Labor Market Flows , 2014 .

[49]  Philip C. Treleaven,et al.  Twitter Sentiment Analysis Applied to Finance: A Case Study in the Retail Industry , 2015, ArXiv.

[50]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[51]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[52]  Paul S. Jacobs,et al.  Joining Statistics with NLP for Text Categorization , 1992, ANLP.

[53]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[54]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[55]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[56]  Jörg Kindermann,et al.  Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.

[57]  Masaru Kitsuregawa,et al.  Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents , 2007, EMNLP.

[58]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[59]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[60]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[61]  George Papadakis,et al.  Content vs. context for sentiment analysis: a comparative analysis over microblogs , 2012, HT '12.

[62]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[63]  Yuan-Fang Wang,et al.  The use of bigrams to enhance text categorization , 2002, Inf. Process. Manag..

[64]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[65]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[66]  Jiebo Luo,et al.  Towards social imagematics: sentiment analysis in social multimedia , 2013, MDMKDD '13.

[67]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[68]  Karo Moilanen,et al.  Sentiment Composition , 2007 .

[69]  Sotiris Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[70]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[71]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[72]  Ke Xu,et al.  MoodLens: an emoticon-based sentiment analysis system for chinese tweets , 2012, KDD.

[73]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[74]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[75]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[76]  Elisabetta Fersini,et al.  Sentiment analysis: Bayesian Ensemble Learning , 2014, Decis. Support Syst..

[77]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[78]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .