Towards a Universal Sentiment Classifier in Multiple languages

Existing sentiment classifiers usually work for only one specific language, and different classification models are used in different languages. In this paper we aim to build a universal sentiment classifier with a single classification model in multiple different languages. In order to achieve this goal, we propose to learn multilingual sentiment-aware word embeddings simultaneously based only on the labeled reviews in English and unlabeled parallel data available in a few language pairs. It is not required that the parallel data exist between English and any other language, because the sentiment information can be transferred into any language via pivot languages. We present the evaluation results of our universal sentiment classifier in five languages, and the results are very promising even when the parallel data between English and the target languages are not used. Furthermore, the universal single classifier is compared with a few cross-language sentiment classifiers relying on direct parallel data between the source and target languages, and the results show that the performance of our universal sentiment classifier is very promising compared to that of different cross-language classifiers in multiple target languages.

[1]  Ming Zhou,et al.  Bilingually-constrained Phrase Embeddings for Machine Translation , 2014, ACL.

[2]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[3]  Wanxiang Che,et al.  Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources , 2014, COLING.

[4]  Dong-Hong Ji,et al.  A topic-enhanced word embedding for Twitter sentiment classification , 2016, Inf. Sci..

[5]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[6]  Xiaojun Wan,et al.  Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning , 2016, ACL.

[7]  Danushka Bollegala,et al.  Cross-Domain Sentiment Classification Using Sentiment Sensitive Embeddings , 2016, IEEE Transactions on Knowledge and Data Engineering.

[8]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[9]  Philipp Koehn,et al.  A parallel corpus for statistical machine translation , 2005 .

[10]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[11]  Shi Feng,et al.  Knowledge-Based Semantic Embedding for Machine Translation , 2016, ACL.

[12]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[13]  Claire Cardie,et al.  Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora , 2011, ACL.

[14]  Xiaojun Wan,et al.  Attention-based LSTM Network for Cross-Lingual Sentiment Classification , 2016, EMNLP.

[15]  Guido Zuccon,et al.  Integrating and Evaluating Neural Word Embeddings in Information Retrieval , 2015, ADCS.

[16]  Benno Stein,et al.  Cross-Language Text Classification Using Structural Correspondence Learning , 2010, ACL.

[17]  Zhongfei Zhang,et al.  Structural Correspondence Learning for Cross-Lingual Sentiment Classification with One-to-Many Mappings , 2016, AAAI.

[18]  Long Chen,et al.  Learning Bilingual Sentiment Word Embeddings for Cross-language Sentiment Classification , 2015, ACL.

[19]  Houfeng Wang,et al.  Cross-Lingual Mixture Model for Sentiment Classification , 2012, ACL.

[20]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[21]  Marcin Junczys-Dowmunt,et al.  The United Nations Parallel Corpus v1.0 , 2016, LREC.

[22]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[23]  Tao Chen,et al.  Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification , 2015, Cognitive Computation.

[24]  Min Xiao,et al.  Semi-Supervised Representation Learning for Cross-Lingual Text Classification , 2013, EMNLP.

[25]  Yu Lei,et al.  Learning to Adapt Credible Knowledge in Cross-lingual Sentiment Analysis , 2015, ACL.

[26]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[27]  André F. T. Martins,et al.  Jointly Learning to Embed and Predict with Multiple Languages , 2016, ACL.

[28]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[29]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[30]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[31]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[32]  Ming Zhou,et al.  Cross-lingual Sentiment Lexicon Learning With Bilingual Word Graph Label Propagation , 2015, CL.

[33]  Thomas Hofmann,et al.  Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification , 2017, WWW.

[34]  Christopher D. Manning,et al.  Learning Distributed Representations for Multilingual Text Sequences , 2015, VS@HLT-NAACL.

[35]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[36]  Xiaojun Wan,et al.  Learning Bilingual Embedding Model for Cross-Language Sentiment Classification , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[37]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.