SenSALDO: a Swedish Sentiment Lexicon for the SWE-CLARIN Toolbox

The field of sentiment analysis or opinion mining consists in automatically classifying text according to the positive or negative sentiment expressed in it, and has become very popular in the last decade. However, most data and software resources are built for English and a few other languages. In this paper we compare and test different corpus-based and lexicon-based methods for creating a sentiment lexicon. We then manually curate the results of the best performing method. The result, SenSALDO, is a comprehensive sentiment lexicon for Swedish containing 7,618 word senses as well as a full-form version of this lexicon containing 65,953 items (text word forms). SenSALDO is freely available as a research tool in the SWE-CLARIN toolbox under an open-source CC-BY license.

[1]  Gregory J. Park,et al.  Psychological Language on Twitter Predicts County-Level Heart Disease Mortality , 2015, Psychological science.

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Ronald Fagin,et al.  Comparing and aggregating rankings with ties , 2004, PODS '04.

[4]  Lars Borin,et al.  Defining a Gold Standard for a Swedish Sentiment Lexicon: Towards Higher-Yield Text Mining in the Digital Humanities , 2018, DHN.

[5]  Richard Johansson,et al.  Embedding a Semantic Network in a Word Space , 2015, NAACL.

[6]  Viggo Kann,et al.  Constructing a Swedish General Purpose Polarity Lexicon : Random Walks in the People's Dictionary of Synonyms , 2010 .

[7]  Richard Johansson,et al.  Embedding Senses for Efficient Graph-based Word Sense Disambiguation , 2016, TextGraphs@NAACL-HLT.

[8]  M. Kendall The treatment of ties in ranking problems. , 1945, Biometrika.

[9]  Viggo Kann,et al.  Free construction of a free Swedish dictionary of synonyms , 2005, NODALIDA.

[10]  Mike Thelwall,et al.  Sentiment Analysis for Small and Big Data , 2017 .

[11]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[12]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[13]  Lars Borin,et al.  Generating a Gold Standard for a Swedish Sentiment Lexicon , 2018, LREC.

[14]  Nina Tahmasebi,et al.  Building a Sentiment Lexicon for Swedish , 2016 .

[15]  Saif Mohammad,et al.  Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best–Worst Scaling , 2016, NAACL.

[16]  Khurshid Ahmad,et al.  Is there a language of sentiment? An analysis of lexical resources for sentiment analysis , 2013, Language Resources and Evaluation.

[17]  Lars Borin,et al.  Tracking Attitudes Towards Immigration in Swedish Media , 2019, DHN.

[18]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[19]  Sara Tonelli,et al.  Towards sentiment analysis for historical texts , 2016, Digit. Scholarsh. Humanit..

[20]  Daniel Zwillinger,et al.  CRC Standard Probability and Statistics Tables and Formulae, Student Edition , 1999 .

[21]  Paul Ormerod,et al.  Books Average Previous Decade of Economic Misery , 2014, PloS one.

[22]  Hinrich Schütze,et al.  Ultradense Word Embeddings by Orthogonal Transformation , 2016, NAACL.

[23]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[24]  F. Sebastiani,et al.  Random-Walk Models of Term Semantics: An Application to Opinion-Related Properties , 2007 .

[25]  Markus Forsberg,et al.  A Diachronic Computational Lexical Resource for 800 Years of Swedish , 2011, Language Technology for Cultural Heritage.

[26]  Lars Borin,et al.  How Can Big Data Help Us Study Rhetorical History , 2016 .

[27]  Anis Yazidi,et al.  On Enhancing the Label Propagation Algorithm for Sentiment Analysis Using Active Learning with an Artificial Oracle , 2015, ICAISC.

[28]  Lars Borin,et al.  The Swedish Culturomics Gigaword Corpus: A One Billion Word Swedish Reference Dataset for NLP , 2016 .

[29]  Jure Leskovec,et al.  Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora , 2016, EMNLP.

[30]  Steven Skiena,et al.  Building Sentiment Lexicons for All Major Languages , 2014, ACL.

[31]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..