Emotion Tokens: Bridging the Gap among Multilingual Twitter Sentiment Analysis

Twitter is a microblogging service where worldwide users publish their feelings. However, sentiment analysis for Twitter messages (tweets) is regarded as a challenging problem because tweets are short and informal. In this paper, we focus on this problem by the analysis of emotion tokens, including emotion symbols (e.g. emoticons), irregular forms of words and combined punctuations. According to our observation on five million tweets, these emotion tokens are commonly used (0.47 emotion tokens per tweet). They directly express one's emotion regardless of his language; hence become a useful signal for sentiment analysis on multilingual tweets. Firstly, emotion tokens are extracted automatically from tweets. Secondly, a graph propagation algorithm is proposed to label the tokens' polarities. Finally, a multilingual sentiment analysis algorithm is introduced. Comparative evaluations are conducted among semantic lexicon based approach and some state-of-the-art Twitter sentiment analysis Web services, both on English and non-English tweets. Experimental results show effectiveness of the proposed algorithms.

[1]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[2]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[3]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[4]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[5]  Steven Skiena,et al.  International Sentiment Analysis for News and Blogs , 2021, ICWSM.

[6]  Nicholas Diakopoulos,et al.  Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs , 2011, EMNLP.

[7]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[8]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[9]  Bernard J. Jansen,et al.  Micro-blogging as online word of mouth branding , 2009, CHI Extended Abstracts.

[10]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[11]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[12]  Yu Sun,et al.  Automatic Extraction for Product Feature Words from Comments on the Web , 2009, AIRS.

[13]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[14]  Kerstin Denecke,et al.  Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[15]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.

[16]  Philip Resnik,et al.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation , 2010, EMNLP.

[17]  Rada Mihalcea,et al.  A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources , 2008, LREC.

[18]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[19]  Mitsuru Ishizuka,et al.  SentiFul: A Lexicon for Sentiment Analysis , 2011, IEEE Transactions on Affective Computing.

[20]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[21]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..