Cross-Lingual Sentiment Analysis Without (Good) Translation

Current approaches to cross-lingual sentiment analysis try to leverage the wealth of labeled English data using bilingual lexicons, bilingual vector space embeddings, or machine translation systems. Here we show that it is possible to use a single linear transformation, with as few as 2000 word pairs, to capture fine-grained sentiment relationships between words in a cross-lingual setting. We apply these cross-lingual sentiment models to a diverse set of tasks to demonstrate their functionality in a non-English context. By effectively leveraging English sentiment knowledge without the need for accurate translation, we can analyze and extract features from other languages with scarce data at a very low cost, thus making sentiment and related analyses for many languages inexpensive.

[1]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Xiaojun Wan,et al.  Learning Bilingual Embedding Model for Cross-Language Sentiment Classification , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[4]  Phil Blunsom,et al.  Multilingual Distributed Representations without Word Alignment , 2013, ICLR 2014.

[5]  Xiaojun Wan,et al.  Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning , 2016, ACL.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Pushpak Bhattacharyya,et al.  Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets , 2012, COLING.

[8]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[9]  Saif Mohammad,et al.  Sentiment after Translation: A Case-Study on Arabic Social Media Posts , 2015, NAACL.

[10]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[11]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[12]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Hiroshi Kanayama,et al.  Multilingual Training of Crosslingual Word Embeddings , 2017, EACL.

[15]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[16]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[17]  Long Chen,et al.  Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification , 2015, ACL.

[18]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[19]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[20]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[21]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[22]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[23]  Sarthak Jain,et al.  Cross Lingual Sentiment Analysis using Modified BRAE , 2015, EMNLP.

[24]  Hiroshi Kanayama,et al.  Learning Crosslingual Word Embeddings without Bilingual Corpora , 2016, EMNLP.

[25]  Saif Mohammad,et al.  Sentiment Lexicons for Arabic Social Media , 2016, LREC.

[26]  Huey Yee Lee,et al.  Chinese Sentiment Analysis Using Maximum Entropy , 2011 .

[27]  TanSongbo,et al.  An empirical study of sentiment analysis for chinese documents , 2008 .

[28]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[29]  Hang Lei,et al.  An Empirical Study on Sentiment Classification of Chinese Review using Word Embedding , 2015, PACLIC.