Sentiment Analysis of Short Informal Texts

We describe a state-of-the-art sentiment analysis system that detects (a) the sentiment of short informal textual messages such as tweets and SMS (message-level task) and (b) the sentiment of a word or a phrase within a message (term-level task). The system is based on a supervised statistical text classification approach leveraging a variety of surface-form, semantic, and sentiment features. The sentiment features are primarily derived from novel high-coverage tweet-specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment-word hashtags and from tweets with emoticons. To adequately capture the sentiment of words in negated contexts, a separate sentiment lexicon is generated for negated words. The system ranked first in the SemEval-2013 shared task 'Sentiment Analysis in Twitter' (Task 2), obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. Post-competition improvements boost the performance to an F-score of 70.45 (message-level task) and 89.50 (term-level task). The system also obtains state-of-the-art performance on two additional datasets: the SemEval-2013 SMS test set and a corpus of movie review excerpts. The ablation experiments demonstrate that the use of the automatically generated lexicons results in performance gains of up to 6.5 absolute percentage points.

[1]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[2]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[3]  Mitsuru Ishizuka,et al.  Affect Analysis Model: novel rule-based approach to affect sensing from text , 2010, Natural Language Engineering.

[4]  Mike Thelwall,et al.  Sentiment in Twitter events , 2011, J. Assoc. Inf. Sci. Technol..

[5]  Jerome Bellegarda,et al.  Emotion Analysis Using Latent Affective Folding and Embedding , 2010, HLT-NAACL 2010.

[6]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[7]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[8]  Anthony C. Boucouvalas,et al.  Representing Emotional Momentum within Expressive Internet Communication , 2006, EuroIMSA.

[9]  Regina Barzilay,et al.  Automatic Aggregation by Joint Modeling of Aspects and Values , 2014, J. Artif. Intell. Res..

[10]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[11]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[12]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[13]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Lee Becker,et al.  AVAYA: Sentiment Analysis on Twitter with Self-Training and Polarity Lexicon Expansion , 2013, *SEMEVAL.

[16]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[17]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[18]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[19]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[20]  Lluís F. Hurtado,et al.  Sentiment Analysis in Twitter for Spanish , 2014, NLDB.

[21]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[22]  Clement T. Yu,et al.  The effect of negation on sentiment analysis and retrieval effectiveness , 2009, CIKM.

[23]  Claire Cardie,et al.  Joint Inference for Fine-grained Opinion Extraction , 2013, ACL.

[24]  Hugo Liu,et al.  A Corpus-based Approach to Finding Happiness , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[25]  Saif Mohammad,et al.  Using Hashtags to Capture Fine Emotion Categories from Tweets , 2015, Comput. Intell..

[26]  B. Orme MaxDiff Analysis : Simple Counting , Individual-Level Logit , and HB , 2009 .

[27]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[28]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[29]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[30]  Nicholas Diakopoulos,et al.  Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs , 2011, EMNLP.

[31]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[32]  M. Oss What do customers really want? , 2006, Behavioral healthcare.

[33]  Gus Welty WHAT DO CUSTOMERS REALLY WANT , 1994 .

[34]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[35]  Saif Mohammad,et al.  SemEval-2012 Task 2: Measuring Degrees of Relational Similarity , 2012, *SEMEVAL.

[36]  Alistair Kennedy,et al.  Sentiment Classification of Movie and Product Reviews Using Contextual Valence Shifters , 2005 .

[37]  Guodong Zhou,et al.  Learning the Scope of Negation via Shallow Semantic Parsing , 2010, COLING.

[38]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[39]  Michel Généreux,et al.  Distinguishing affective states in weblogs , 2006, AAAI 2006.

[40]  Claire Cardie,et al.  Hierarchical Sequential Learning for Extracting Opinions and Their Attributes , 2010, ACL.

[41]  Luis Alfonso Ureña López,et al.  Sentiment analysis in Twitter , 2012, Natural Language Engineering.

[42]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[43]  Saif Mohammad,et al.  Tracking Sentiment in Mail: How Genders Differ on Emotional Axes , 2011, WASSA@ACL.

[44]  Saif Mohammad,et al.  Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus , 2009, EMNLP.

[45]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[46]  Saif Mohammad,et al.  NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews , 2014, *SEMEVAL.

[47]  Lilja Øvrelid,et al.  Representing and Resolving Negation for Sentiment Analysis , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[48]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[49]  Virginia Francisco,et al.  Automated Mark Up of Affective Information in English Texts , 2006, TSD.

[50]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[51]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[52]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[53]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[54]  Dietrich Klakow,et al.  A survey on the role of negation in sentiment analysis , 2010, NeSp-NLP@ACL.

[55]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[56]  Joel D. Martin,et al.  Identifying purpose behind electoral tweets , 2013, WISDOM '13.

[57]  Stefan Evert,et al.  KLUE: Simple and robust methods for polarity classification , 2013, *SEMEVAL.

[58]  Vasudeva Varma,et al.  Mining Sentiments from Tweets , 2012, WASSA@ACL.

[59]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[60]  George Papadakis,et al.  Textual and contextual patterns for sentiment analysis over microblogs , 2012, WWW.

[61]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[62]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[63]  Richard Johansson,et al.  Relational Features in Fine-Grained Opinion Analysis , 2013, CL.

[64]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[65]  C. Osgood,et al.  The Pollyanna hypothesis. , 1969 .

[66]  Saratendu Sethi,et al.  teragram: Rule-based detection of sentiment phrases using SAS Sentiment Analysis , 2013, *SEMEVAL.

[67]  Anthony C. Boucouvalas,et al.  Real Time Text-to-Emotion Engine for Expressive Internet Communications , 2003 .

[68]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.