Building a Twitter opinion lexicon from automatically-annotated tweets

We propose a supervised model for expanding an opinion lexicon for Twitter.We combine automatically annotated tweets with existing hand-made opinion lexicons.We use POS tags and associations between words and sentiment as word-level features.Expanded words are mapped to a positive, negative, and neutral distribution.We outperform the performance obtained by using PMI semantic orientation alone. Opinion lexicons, which are lists of terms labeled by sentiment, are widely used resources to support automatic sentiment analysis of textual passages. However, existing resources of this type exhibit some limitations when applied to social media messages such as tweets (posts in Twitter), because they are unable to capture the diversity of informal expressions commonly found in this type of media.In this article, we present a method that combines information from automatically annotated tweets and existing hand-made opinion lexicons to expand an opinion lexicon in a supervised fashion. The expanded lexicon contains part-of-speech (POS) disambiguated entries with a probability distribution for positive, negative, and neutral polarity classes, similarly to SentiWordNet.To obtain this distribution using machine learning, we propose word-level attributes based on (a) the morphological information conveyed by POS tags and (b) associations between words and the sentiment expressed in the tweets that contain them. We consider tweets with both hard and soft sentiment labels. The sentiment associations are modeled in two different ways: using point-wise-mutual-information semantic orientation (PMI-SO), and using stochastic gradient descent semantic orientation (SGD-SO), which learns a linear relationship between words and sentiment. The training dataset is labeled by a seed lexicon formed by combining multiple hand-annotated lexicons.Our experimental results show that our method outperforms the three-dimensional word-level polarity classification performance obtained by using PMI-SO alone. This is significant because PMI-SO is a state-of-the-art measure for establishing world-level sentiment. Additionally, we show that lexicons created with our method achieve significant improvements over SentiWordNet for classifying tweets into polarity classes, and also outperform SentiStrength in the majority of the experiments.

[1]  Jane Yung-jen Hsu,et al.  Building a Concept-Level Sentiment Dictionary Based on Commonsense Knowledge , 2013, IEEE Intelligent Systems.

[2]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[3]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[4]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[5]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[6]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Erik Cambria,et al.  Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis , 2015 .

[9]  Heiner Stuckenschmidt,et al.  Fine-Grained Sentiment Analysis with Structural Features , 2011, IJCNLP.

[10]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[11]  Felipe Bravo-Marquez,et al.  From Unlabelled Tweets to Twitter-specific Opinion Words , 2015, SIGIR.

[12]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[13]  Tom M. Mitchell,et al.  Semi-Supervised Text Classification Using EM , 2006, Semi-Supervised Learning.

[14]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[15]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[16]  Felipe Bravo-Marquez,et al.  Meta-level sentiment models for big social data analysis , 2014, Knowl. Based Syst..

[17]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[18]  Miles Osborne,et al.  The Edinburgh Twitter Corpus , 2010, HLT-NAACL 2010.

[19]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[20]  Alexander Zien,et al.  Semi-Supervised Text Classification Using EM , 2006 .

[21]  Lee Becker,et al.  AVAYA: Sentiment Analysis on Twitter with Self-Training and Polarity Lexicon Expansion , 2013, *SEMEVAL.

[22]  Yorick Wilks,et al.  The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation , 1998, Natural Language Engineering.

[23]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[24]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[25]  Marcus Liwicki,et al.  Fuzzy Subjective Sentiment Phrases: A Context Sensitive and Self-Maintaining Sentiment Lexicon , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[26]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[27]  Felipe Bravo-Marquez,et al.  Positive, Negative, or Neutral: Learning an Expanded Opinion Lexicon from Emoticon-Annotated Tweets , 2015, IJCAI.

[28]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[29]  Brendan T. O'Connor,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics , 2011 .

[30]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[31]  Erik Cambria,et al.  EmoSenticSpace: A novel framework for affective common-sense reasoning , 2014, Knowl. Based Syst..

[32]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[33]  Xiuzhen Zhang,et al.  Sentiment Analysis on Twitter through Topic-Based Lexicon Expansion , 2014, ADC.

[34]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[35]  Ramón Fernández Astudillo,et al.  INESC-ID: A Regression Model for Large Scale Twitter Sentiment Lexicon Induction , 2015, SemEval@NAACL-HLT.

[36]  Richard Tzong-Han Tsai,et al.  Using relation selection to improve value propagation in a ConceptNet-based sentiment dictionary , 2014, Knowl. Based Syst..

[37]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[38]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[39]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[40]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[41]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[42]  Arno Scharl,et al.  Enriching semantic knowledge bases for opinion mining in big data applications , 2014, Knowl. Based Syst..

[43]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[44]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[45]  Ming Zhou,et al.  Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach , 2014, COLING.

[46]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[47]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[48]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss classification , 2005, CIKM '05.

[49]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .