Multi-lingual support for lexicon-based sentiment analysis guided by semantics

Many sentiment analysis methods rely on sentiment lexicons, containing words and their associated sentiment, and are tailored to one specific language. Yet, the ever-growing amount of data in different languages on the Web renders multi-lingual support increasingly important. In this paper, we assess various methods for supporting an additional target language in lexicon-based sentiment analysis. As a baseline, we automatically translate text into a reference language for which a sentiment lexicon is available, and subsequently analyze the translated text. Second, we consider mapping sentiment scores from a semantically enabled sentiment lexicon in the reference language to a new target sentiment lexicon, by traversing relations between language-specific semantic lexicons. Last, we consider creating a target sentiment lexicon by propagating sentiment of seed words in a semantic lexicon for the target language. When extending sentiment analysis from English to Dutch, mapping sentiment across languages by exploiting relations between semantic lexicons yields a significant performance improvement over the baseline of about 29% in terms of accuracy and macro-level F1 on our data. Propagating sentiment in language-specific semantic lexicons can outperform the baseline by up to about 47%, depending on the seed set of sentiment-carrying words. This indicates that sentiment is not only linked to word meanings, but tends to have a language-specific dimension as well.

[1]  Hsinchun Chen,et al.  Evaluating sentiment in financial news articles , 2012, Decis. Support Syst..

[2]  Xiangji Huang,et al.  Mining Online Reviews for Predicting Sales Performance: A Case Study in the Movie Domain , 2012, IEEE Transactions on Knowledge and Data Engineering.

[3]  J. Wyatt Decision support systems. , 2000, Journal of the Royal Society of Medicine.

[4]  Panagiotis G. Ipeirotis,et al.  Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics , 2010, IEEE Transactions on Knowledge and Data Engineering.

[5]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[6]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[7]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[9]  Patricio Martínez-Barco,et al.  Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments , 2012, Decis. Support Syst..

[10]  Diego Reforgiato Recupero,et al.  OASYS: An Opinion Analysis System , 2006, AAAI 2006.

[11]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[12]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[13]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[14]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[15]  Ophir Frieder,et al.  Repeatable evaluation of search services in dynamic environments , 2007, TOIS.

[16]  C R LanePeter,et al.  On developing robust models for favourability analysis , 2012, DSS 2012.

[17]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[18]  Steven Skiena,et al.  International Sentiment Analysis for News and Blogs , 2021, ICWSM.

[19]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[20]  Zheng Lin,et al.  A Fast and Accurate Method for Bilingual Opinion Lexicon Extraction , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[21]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[22]  Flavius Frasincar,et al.  A Statistical Approach to Star Rating Classification of Sentiment , 2012, IS-MiS.

[23]  Samuel W. K. Chan Beyond keyword and cue-phrase matching: A sentence-based abstraction technique for information extraction , 2006, Decis. Support Syst..

[24]  Walter Daelemans,et al.  An efficient memory-based morphosyntactic tagger and parser for Dutch , 2007, CLIN 2007.

[25]  Uzay Kaymak,et al.  Analyzing Sentiment in a Large Set of Web Data While Accounting for Negation , 2011, AWIC.

[26]  Flavius Frasincar,et al.  Sentiment Lexicon Creation from Lexical Resources , 2011, BIS.

[27]  Tran Cao Son,et al.  Incremental Information Extraction Using Relational Databases , 2012, IEEE Transactions on Knowledge and Data Engineering.

[28]  Kim Schouten,et al.  Semantics-based information extraction for detecting economic events , 2012, Multimedia Tools and Applications.

[29]  David P. Baron,et al.  Competing for the Public Through the News Media , 2003 .

[30]  Carlo Strapparava,et al.  Cross Language Text Categorization by Acquiring Multilingual Domain Models from Comparable Corpora , 2005, ParallelText@ACL.

[31]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[32]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[33]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[34]  Yang Yu,et al.  The impact of social and conventional media on firm equity value: A sentiment analysis approach , 2013, Decis. Support Syst..

[35]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[36]  ChangChia-Hui,et al.  Automatic information extraction from semi-structured Web pages by pattern discovery , 2003 .

[37]  David E. Losada,et al.  Effective and efficient polarity estimation in blogs based on sentence-level evidence , 2011, CIKM '11.

[38]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[39]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[40]  Chia-Hui Chang,et al.  Automatic information extraction from semi-structured Web pages by pattern discovery , 2003, Decis. Support Syst..

[41]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[42]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[43]  Paolo Rosso,et al.  Making objective decisions from subjective data: Detecting irony in customer reviews , 2012, Decis. Support Syst..

[44]  Piek Vossen,et al.  EuroWordNet: a multilingual database for information retrieval , 1997 .

[45]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.

[46]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[47]  Uzay Kaymak,et al.  Determining negation scope and strength in sentiment analysis , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[48]  Katja Hofmann,et al.  Generating a Non-English Subjectivity Lexicon: Relations That Matter , 2009, EACL.

[49]  Xiaojun Wan,et al.  Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis , 2008, EMNLP.

[50]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[51]  David E. Losada,et al.  Sentiment-Based Ranking of Blog Posts Using Rhetorical Structure Theory , 2013, NLDB.

[52]  I. Arnold,et al.  Fundamental uncertainty and stock market volatility , 2008 .

[53]  Andrés Montoyo,et al.  Detecting implicit expressions of emotion in text: A comparative analysis , 2012, Decis. Support Syst..

[54]  Daoud Clarke,et al.  On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data , 2012, Decis. Support Syst..

[55]  Vicki L. Sauter,et al.  Decision Support Systems for Business Intelligence , 2011 .

[56]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[57]  Uzay Kaymak,et al.  Mining Economic Sentiment Using Argumentation Structures , 2010, ER Workshops.

[58]  Kai Zhang,et al.  Topic Mining over Asynchronous Text Sequences , 2012, IEEE Transactions on Knowledge and Data Engineering.

[59]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[60]  Sydney C. Ludvigson,et al.  Consumer Confidence and Consumer Spending , 2004 .

[61]  Katja Hofmann,et al.  The Cornetto Database: Architecture and User-Scenarios , 2007 .

[62]  Kimberly D. Voll,et al.  Extracting sentiment as a function of discourse structure and topicality , 2008 .

[63]  Uzay Kaymak,et al.  Polarity analysis of texts using discourse structure , 2011, CIKM '11.

[64]  Carolyn F. Holton,et al.  Identifying disgruntled employee systems fraud risk through text mining: A simple solution for a multi-billion dollar problem , 2009, Decis. Support Syst..

[65]  Sasha Blair-Goldensohn,et al.  Sentiment Summarization: Evaluating and Learning User Preferences , 2009, EACL.

[66]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.