A Simple and Efficient Algorithm for Lexicon Generation Inspired by Structural Balance Theory

Sentiment lexicon generation is a major task in the field of Sentiment Analysis. In contrast to the bulk of research that has focused almost exclusively on Label Propagation as primary tool for lexicon generation, we introduce a simple, yet efficient algorithm for lexicon generation that is inspired by Structural Balance Theory. Our algorithm is shown to outperform the classical Label Propagation algorithm. A major drawback of Label Propagation resides in the fact that words which are situated many hops away from the seed words tend to get low sentiment values since the inaccuracy in the synonym-relationship is not taken properly into account. In fact, a label of a word is simply the average of it is neighbours. To circumvent this problem, we propose a novel algorithm that supports better transitive sentiment polarity transferring from seed word to target words using the theory of Structural Balance theory. The premise of the algorithm is exemplified using the enemy of my enemy is my friend that preserves the transitivity structure captured by antonyms and synonyms. Thus, a low sentiment score is an indication of sentimental neutrality rather than due to the fact that the word in question is located at a far distance from the seeds. The lexicons based on thesauruses were built using different variants of our proposed algorithm. The lexicons were evaluated by classifying product and movie reviews and the results show satisfying classification performances that outperform Label Propagation. We consider Norwegian as a case study, but the algorithm be can easily applied to other languages.

[1]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[2]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[3]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[4]  Dragomir R. Radev,et al.  Identifying Text Polarity Using Random Walks , 2010, ACL.

[5]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[6]  Lilja Øvrelid,et al.  Sentiment classification of online political discussions: a comparison of a word-based and dependency-based method , 2014, WASSA@ACL.

[7]  Anis Yazidi,et al.  Building sentiment Lexicons applying graph theory on information from three Norwegian thesauruses , 2014, NIK.

[8]  Pasi Fränti,et al.  Web Data Mining , 2009, Encyclopedia of Database Systems.

[9]  Delip Rao,et al.  Semi-Supervised Polarity Lexicon Induction , 2009, EACL.

[10]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[11]  Saif Mohammad,et al.  Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus , 2009, EMNLP.

[12]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[13]  Soo-Min Kim,et al.  Automatic Identification of Pro and Con Reasons in Online Reviews , 2006, ACL.

[14]  Jennifer Golbeck,et al.  Computing and Applying Trust in Web-based Social Networks , 2005 .

[15]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[16]  Dragomir R. Radev,et al.  Identifying the Semantic Orientation of Foreign Words , 2011, ACL.

[17]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[18]  Sasha Blair-Goldensohn,et al.  Building a Sentiment Summarizer for Local Service Reviews , 2008 .

[19]  Christian Bauckhage,et al.  The slashdot zoo: mining a social network with negative edges , 2009, WWW.

[20]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[21]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.