SenSALDO: Creating a Sentiment Lexicon for Swedish

The natural language processing subfield known as sentiment analysis or opinion mining has seen an explosive expansion over the last decade or so, and sentiment analysis has become a standard item in the NLP toolbox. Still, many theoretical and methodological questions remain unanswered and resource gaps unfilled. Most work on automated sentiment analysis has been done on English and a few other languages; for most written languages of the world, this tool is not available. This paper describes the development of an extensive sentiment lexicon for written (standard) Swedish. We investigate different methods for developing a sentiment lexicon for Swedish. We use an existing gold standard dataset for training and testing. For each word sense from the SALDO Swedish lexicon, we assign a real value sentiment score in the range [-1,1] and produce a sentiment label. We implement and evaluate three methods: a graph-based method that iterates over the SALDO structure, a method based on random paths over the SALDO structure and a corpus-driven method based on word embeddings. The resulting sense-disambiguated sentiment lexicon (SenSALDO) is an open source resource and freely available from Språkbanken, The Swedish Language Bank at the University of Gothenburg.

[1]  Charles Jochim,et al.  Improving Claim Stance Classification with Lexical Knowledge Expansion and Context Utilization , 2017, ArgMining@EMNLP.

[2]  Viggo Kann,et al.  Constructing a Swedish General Purpose Polarity Lexicon : Random Walks in the People's Dictionary of Synonyms , 2010 .

[3]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[4]  F. Sebastiani,et al.  Random-Walk Models of Term Semantics: An Application to Opinion-Related Properties , 2007 .

[5]  Nina Tahmasebi,et al.  Building a Sentiment Lexicon for Swedish , 2016 .

[6]  Richard Johansson,et al.  Here be dragons? The perils and promises of inter-resource lexical-semantic mapping , 2015 .

[7]  Lars Borin,et al.  Bring vs. MTRoget: Evaluating automatic thesaurus translation , 2014, LREC.

[8]  Lars Borin,et al.  Generating a Gold Standard for a Swedish Sentiment Lexicon , 2018, LREC.

[9]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[10]  Lennart Lönngren,et al.  A Swedish Associative Thesaurus , 1998 .

[11]  M. Kendall The treatment of ties in ranking problems. , 1945, Biometrika.

[12]  Viggo Kann,et al.  Free construction of a free Swedish dictionary of synonyms , 2005, NODALIDA.

[13]  Sarah L. Nesbeitt Ethnologue: Languages of the World , 1999 .

[14]  Steven Skiena,et al.  Building Sentiment Lexicons for All Major Languages , 2014, ACL.

[15]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[16]  Richard Johansson,et al.  Embedding a Semantic Network in a Word Space , 2015, NAACL.

[17]  Khurshid Ahmad,et al.  Is there a language of sentiment? An analysis of lexical resources for sentiment analysis , 2013, Language Resources and Evaluation.

[18]  B. V. Verghese,et al.  Thesaurus of English Words and Phrases , 2002 .

[19]  Lars Borin,et al.  Defining a Gold Standard for a Swedish Sentiment Lexicon: Towards Higher-Yield Text Mining in the Digital Humanities , 2018, DHN.

[20]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[21]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[22]  Hinrich Schütze,et al.  Ultradense Word Embeddings by Orthogonal Transformation , 2016, NAACL.

[23]  Ramón Fernández Astudillo,et al.  INESC-ID: A Regression Model for Large Scale Twitter Sentiment Lexicon Induction , 2015, SemEval@NAACL-HLT.

[24]  Ronald Fagin,et al.  Comparing and aggregating rankings with ties , 2004, PODS '04.

[25]  Standard Probability and Statistics Tables and Formulae , 2001 .

[26]  Lars Borin Linguistic diversity in the information society , 2009 .

[27]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[28]  Jure Leskovec,et al.  Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora , 2016, EMNLP.