A Language Independent Method for Generating Large Scale Polarity Lexicons

Sentiment Analysis systems aims at detecting opinions and sentiments that are expressed in texts. Many approaches in literature are based on resources that model the prior polarity of words or multi-word expressions, i.e. a polarity lexicon. Such resources are defined by teams of annotators, i.e. a manual annotation is provided to associate emotional or sentiment facets to the lexicon entries. The development of such lexicons is an expensive and language dependent process, making them often not covering all the linguistic sentiment phenomena. Moreover, once a lexicon is defined it can hardly be adopted in a different language or even a different domain. In this paper, we present several Distributional Polarity Lexicons (DPLs), i.e. large-scale polarity lexicons acquired with an unsupervised methodology based on Distributional Models of Lexical Semantics. Given a set of heuristically annotated sentences from Twitter, we transfer the sentiment information from sentences to words. The approach is mostly unsupervised, and experimental evaluations on Sentiment Analysis tasks in two languages show the benefits of the generated resources. The generated DPLs are publicly available in English and Italian.

[1]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[2]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[3]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[4]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[5]  Harith Alani,et al.  SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twitter , 2014, ESWC.

[6]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[7]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[8]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[9]  Roberto Basili,et al.  KeLP: a Kernel-based Learning Platform for Natural Language Processing , 2015, ACL.

[10]  Zhe Zhang,et al.  ReNew: A Semi-Supervised Framework for Generating Domain-Specific Lexicons and Sentiment Analysis , 2014, ACL.

[11]  Roberto Basili,et al.  Parsing engineering and empirical robustness , 2002, Natural Language Engineering.

[12]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[13]  Roberto Basili,et al.  Acquiring a Large Scale Polarity Lexicon Through Unsupervised Distributional Methods , 2015, NLDB.

[14]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[15]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[16]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[17]  Malvina Nissim,et al.  Sentiment analysis on Italian tweets , 2013, WASSA@NAACL-HLT.

[18]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[19]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[20]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[21]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[22]  Malvina Nissim,et al.  Overview of the Evalita 2014 SENTIment POLarity Classification Task , 2014 .

[23]  Delip Rao,et al.  Semi-Supervised Polarity Lexicon Induction , 2009, EACL.

[24]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[25]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[26]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[27]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[28]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[29]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .