Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus

Sentiment analysis often relies on a semantic orientation lexicon of positive and negative words. A number of approaches have been proposed for creating such lexicons, but they tend to be computationally expensive, and usually rely on significant manual annotation and large corpora. Most of these methods use WordNet. In contrast, we propose a simple approach to generate a high-coverage semantic orientation lexicon, which includes both individual words and multi-word expressions, using only a Roget-like thesaurus and a handful of affixes. Further, the lexicon has properties that support the Polyanna Hypothesis. Using the General Inquirer as gold standard, we show that our lexicon has 14 percentage points more correct entries than the leading WordNet-based high-coverage lexicon (SentiWordNet). In an extrinsic evaluation, we obtain significantly higher performance in determining phrase polarity using our thesaurus-based lexicon than with any other. Additionally, we explore the use of visualization techniques to gain insight into the our algorithm beyond the evaluations mentioned above.

[1]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[2]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[3]  C. Osgood,et al.  The Pollyanna hypothesis. , 1969 .

[4]  A. Lehrer Semantic fields and lexical structure , 1974 .

[5]  E. Battistella Markedness: The Evaluative Superstructure of Language , 1990 .

[6]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[7]  Janyce Wiebe,et al.  Tracking Point of View in Narrative , 1994, Comput. Linguistics.

[8]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[9]  Loren Terveen,et al.  PHOAKS: a system for sharing recommendations , 1997, CACM.

[10]  Jun'ichi Tatemura Virtual reviewers for collaborative exploration of movie reviews , 2000, IUI '00.

[11]  M. H. Kelly,et al.  Naming on the Bright Side of Life , 2000 .

[12]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[13]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[14]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[15]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[16]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[17]  N. Kando,et al.  Analysis of Multi-Document Viewpoint Summarization Using Multi-Dimensional Genres , 2004 .

[18]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[19]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[20]  Takashi Inui,et al.  Extracting Semantic Orientations of Words using Spin Model , 2005, ACL.

[21]  Lucian Vlad Lita,et al.  Qualitative Dimensions in Question Answering: Extending the Definitional QA Task , 2005, AAAI.

[22]  Graeme Hirst,et al.  Distributional measures of concept-distance: A task-oriented evaluation , 2006, EMNLP.

[23]  Hiroshi Kanayama,et al.  Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis , 2006, EMNLP.

[24]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[25]  Gregory Grefenstette,et al.  Validating the Coverage of Lexical Resources for Affect Analysis and Automatically Classifying New Words along Semantic Axes , 2006, Computing Attitude and Affect in Text.

[26]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[27]  Stan Szpakowicz,et al.  Identifying Expressions of Emotion in Text , 2007, TSD.

[28]  Swapna Somasundaran,et al.  QA with Attitude: Exploiting Opinion Type Analysis for Improving Question Answering in On-line Discussions and the News , 2007, ICWSM.

[29]  Graeme Hirst,et al.  Computing Word-Pair Antonymy , 2008, EMNLP.

[30]  Jimmy J. Lin,et al.  Multiple Alternative Sentence Compressions and Word-Pair Antonymy for Automatic Text Summarization and Recognizing Textual Entailment , 2008, TAC.

[31]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[32]  Andrea Esuli,et al.  Automatic generation of lexical resources for opinion mining: models, algorithms and applications , 2010, SIGF.

[33]  Ben Shneiderman,et al.  Analyzing Social Media Networks with NodeXL: Insights from a Connected World , 2010 .

[34]  Tejashri Inadarchand Jain,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2010 .