Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts

We present the development and evaluation of a semantic analysis task that lies at the intersection of two very trendy lines of research in contemporary computational linguistics: (1) sentiment analysis, and (2) natural language processing of social media text. The task was part of SemEval, the International Workshop on Semantic Evaluation, a semantic evaluation forum previously known as SensEval. The task ran in 2013 and 2014, attracting the highest number of participating teams at SemEval in both years, and there is an ongoing edition in 2015. The task included the creation of a large contextual and message-level polarity corpus consisting of tweets, SMS messages, LiveJournal messages, and a special test set of sarcastic tweets. The evaluation attracted 44 teams in 2013 and 46 in 2014, who used a variety of approaches. The best teams were able to outperform several baselines by sizable margins with improvement across the 2 years the task has been run. We hope that the long-lasting role of this task and the accompanying datasets will be to serve as a test bed for comparing different approaches, thus facilitating research.

[1]  P. Lewinsohn,et al.  Some relations between pleasant and unpleasant mood-related events and depression. , 1978, Journal of abnormal psychology.

[2]  P. Lewinsohn,et al.  The Pleasant Events Schedule: Studies on Reliability, Validity, and Scale Intercorrelation. , 1982 .

[3]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[4]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[5]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[6]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[7]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[8]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[9]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[10]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[11]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[12]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[13]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[14]  George Forman,et al.  Quantifying counts and costs via classification , 2008, Data Mining and Knowledge Discovery.

[15]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[16]  A. Bernardo,et al.  Huberman, Romero, and Wu, Fang. . Social Networks that Matter: Twitter Under the Microscope. , 2008 .

[17]  Claire Cardie,et al.  Topic Identification for Fine-Grained Opinion Analysis , 2008, COLING.

[18]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[19]  Preslav Nakov,et al.  Language-Independent Sentiment Analysis Using Subjectivity and Positional Information , 2009, RANLP.

[20]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[21]  Patrick Paroubek,et al.  Twitter Based System: Using Twitter for Disambiguating Sentiment Ambiguous Adjectives , 2010, *SEMEVAL.

[22]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[23]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[24]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[25]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[26]  Ari Rappoport,et al.  Semi-Supervised Recognition of Sarcasm in Twitter and Amazon , 2010, CoNLL.

[27]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[28]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[29]  Ari Rappoport,et al.  Semi-Supervised Recognition of Sarcasm in Twitter and Amazon , 2010, CoNLL.

[30]  Andrea Esuli,et al.  AI and Opinion Mining, Part 2 , 2010, IEEE Intelligent Systems.

[31]  Andrea Esuli,et al.  Sentiment Quantification , 2010, IEEE Intell. Syst..

[32]  Carina Silberer,et al.  Proceedings of the 5th International Workshop on Semantic Evaluation , 2010 .

[33]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[34]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[35]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[36]  Geoff Holmes,et al.  Detecting Sentiment Change in Twitter Streaming Data , 2011, WAPA.

[37]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[38]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[39]  K. Bretonnel Cohen,et al.  Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine? , 2011, CL.

[40]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.

[41]  Sara Rosenthal,et al.  Detecting Opinionated Claims in Online Discussions , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[42]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[43]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[44]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[45]  Verónica Pérez-Rosas,et al.  Learning Sentiment Lexicons in Spanish , 2012, LREC.

[46]  Arzucan Özgür,et al.  BOUNCE: Sentiment Classification in Twitter using Rich Feature Sets , 2013, *SEMEVAL.

[47]  Rizal Setya Perdana What is Twitter , 2013 .

[48]  Tao Chen,et al.  Creating a live, public short message service corpus: the NUS SMS corpus , 2011, Lang. Resour. Evaluation.

[49]  Kalina Bontcheva,et al.  TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text , 2013, RANLP.

[50]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[51]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[52]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[53]  Saratendu Sethi,et al.  teragram: Rule-based detection of sentiment phrases using SAS Sentiment Analysis , 2013, *SEMEVAL.

[54]  Dirk Hovy,et al.  Learning Whom to Trust with MACE , 2013, NAACL.

[55]  Deniz Yuret,et al.  Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) , 2013 .

[56]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[57]  Natalia V. Loukachevitch,et al.  Evaluating Sentiment Analysis Systems in Russian , 2013, BSNLP@ACL.

[58]  José Carlos González,et al.  TASS - Workshop on Sentiment Analysis at SEPLN , 2013, Proces. del Leng. Natural.

[59]  Lan Man,et al.  ECNUCS: A Surface Information Based System Description of Sentiment Analysis in Twitter in the SemEval-2013 (Task 2) , 2013, *SEMEVAL.

[60]  Tobias Günther,et al.  GU-MLT-LT: Sentiment Analysis of Short Messages using Linguistic Features and Stochastic Gradient Descent , 2013, *SEMEVAL.

[61]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[62]  Romaric Besançon,et al.  [LVIC-LIMSI]: Using Syntactic Features and Multi-polarity Words for Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[63]  Lee Becker,et al.  AVAYA: Sentiment Analysis on Twitter with Self-Training and Polarity Lexicon Expansion , 2013, *SEMEVAL.

[64]  José Carlos González Cristóbal,et al.  TASS - Workshop on Sentiment Analysis at SEPLN , 2013 .

[65]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[66]  Richard Johansson,et al.  RTRGO: Enhancing the GU-MLT-LT System for Sentiment Analysis of Short Messages , 2014, *SEMEVAL.

[67]  Hongyu Guo,et al.  An Empirical Study on the Effect of Negation Words on Sentiment , 2014, ACL.

[68]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[69]  Man Lan,et al.  ECNU: Expression- and Message-level Sentiment Orientation Classification in Twitter Using Multiple Effective Features , 2014, *SEMEVAL.

[70]  Stefan Evert,et al.  SentiKLUE: Updating a Polarity Classifier in 48 Hours , 2014, *SEMEVAL.

[71]  Saif Mohammad,et al.  NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews , 2014, *SEMEVAL.

[72]  Cícero Nogueira dos Santos Think Positive: Towards Twitter Sentiment Analysis from Scratch , 2014, SemEval@COLING.

[73]  Claire Cardie,et al.  39. Opinion mining and sentiment analysis , 2014 .

[74]  Ming Zhou,et al.  Coooolll: A Deep Learning System for Twitter Sentiment Classification , 2014, *SEMEVAL.

[75]  Noah A. Smith,et al.  A Dependency Parser for Tweets , 2014, EMNLP.

[76]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[77]  Saif Mohammad,et al.  NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets , 2014, SemEval@COLING.

[78]  Tomoko Ohkuma,et al.  TeamX: A Sentiment Analyzer with Enhanced Lexicon Mapping and Weighting Scheme for Unbalanced Data , 2014, *SEMEVAL.

[79]  Paolo Rosso,et al.  SemEval-2015 Task 11: Sentiment Analysis of Figurative Language in Twitter , 2015, *SEMEVAL.

[80]  Andrea Esuli,et al.  Optimizing Text Quantifiers for Multivariate Loss Functions , 2015, TKDD.

[81]  Marine Carpuat,et al.  Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) , 2015 .

[82]  Alessandro Moschitti,et al.  UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification , 2015, *SEMEVAL.

[83]  Tommaso Caselli,et al.  SemEval-2015 Task 9: CLIPEval Implicit Polarity of Events , 2015, *SEMEVAL.

[84]  Preslav Nakov,et al.  Sentiment Analysis in Twitter for Macedonian , 2015, RANLP.

[85]  Preslav Nakov,et al.  SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[86]  Dirk Hovy,et al.  The Rating Game: Sentiment Rating Reproducibility from Text , 2015, EMNLP.

[87]  Preslav Nakov,et al.  Fine-Grained Sentiment Analysis for Movie Reviews in Bulgarian , 2015, RANLP.

[88]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.