Cross-lingual Twitter Polarity Detection via Projection across Word-Aligned Corpora

In this paper, we propose an unsupervised framework that leverages the sentiment resources and tools available in English language to automatically generate stand-alone polarity lexicons and classifiers for languages with scarce subjectivity resources and thus avoids the need for labor intensive manual annotation. Starting with a list of English sentiment-bearing words, we expand this lexicon using WordNet synsets. For each sentence pair in a given bilingual parallel corpus, the highprecision English polarity lexicon is applied to the English side then the output sentiment label is projected onto the target language side via statistically derived word alignments. The resulting lexicon is applied to a large pool of unlabeled tweets in the target language, in order to automatically label tweets as training data to train polarity classifier. Our experiments using Spanish and Portuguese as target ones have shown that the resulting classifiers help to improve polarity classification performance compared to lexicon-based classification for under-resourced languages in social media.

[1]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Vasudeva Varma,et al.  Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification , 2012, LREC.

[4]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[5]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[6]  Xiaodong He Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation , 2007, WMT@ACL.

[7]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[8]  Erik Cambria,et al.  SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis , 2014, AAAI.

[9]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[10]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[11]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[12]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[13]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[14]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[15]  S. Albayrak,et al.  Language-Independent Twitter Sentiment Analysis , 2012 .

[16]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[17]  Kerstin Denecke,et al.  Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[18]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[19]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[20]  Choochart Haruechaiyasak,et al.  Discovering Consumer Insight from Twitter via Sentiment Analysis , 2012, J. Univers. Comput. Sci..