Building sentiment Lexicons applying graph theory on information from three Norwegian thesauruses

Sentiment lexicons are the most used tool to automatically predict sentiment in text. To the best of our knowledge, there exist no openly available sentiment lexicons for the Norwegian language. Thus in this paper we applied two different strategies to automatically generate sentiment lexicons for the Norwegian language. The first strategy used machine translation to translate an English sentiment lexicon to Norwegian and the other strategy used information from three different thesauruses to build several sentiment lexicons. The lexicons based on thesauruses were built using the Label propagation algorithm from graph theory. The lexicons were evaluated by classifying product and movie reviews. The results show satisfying classification performances. Different sentiment lexicons perform well on product and on movie reviews. Overall the lexicon based on machine translation performed the best, showing that linguistic resources in English can be translated to Norwegian without losing significant value.

[1]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[2]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[3]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[4]  Koenraad De Smedt,et al.  Automatic proofreading for Norwegian: The challenges of lexical and grammatical variation , 1999, NODALIDA.

[5]  Saif Mohammad,et al.  Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus , 2009, EMNLP.

[6]  Soo-Min Kim,et al.  Automatic Identification of Pro and Con Reasons in Online Reviews , 2006, ACL.

[7]  Dragomir R. Radev,et al.  Identifying the Semantic Orientation of Foreign Words , 2011, ACL.

[8]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[9]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[10]  Lilja Øvrelid,et al.  Sentiment classification of online political discussions: a comparison of a word-based and dependency-based method , 2014, WASSA@ACL.

[11]  Delip Rao,et al.  Semi-Supervised Polarity Lexicon Induction , 2009, EACL.

[12]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[13]  Anis Yazidi,et al.  Constructing Sentiment Lexicons in Norwegian from a Large Text Corpus , 2014, 2014 IEEE 17th International Conference on Computational Science and Engineering.

[14]  Sasha Blair-Goldensohn,et al.  Building a Sentiment Summarizer for Local Service Reviews , 2008 .

[15]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[16]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[17]  Wei Peng,et al.  Generate Adjective Sentiment Dictionary for Social Media Sentiment Analysis Using Constrained Nonnegative Matrix Factorization , 2021, ICWSM.

[18]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[19]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[20]  Dragomir R. Radev,et al.  Identifying Text Polarity Using Random Walks , 2010, ACL.

[21]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .