Cross-lingual polarity detection with machine translation

Recent advancements in machine translation foster an interest of its use in sentiment analysis. In this paper, we investigate prospects and limitations of machine translation in sentiment analysis for cross-lingual polarity detection task. We focus on improving classification accuracy in a cross-lingual setting where we have available labeled training instances about particular domain in different languages. We experiment with movie review and product review datasets consisting of polar texts in English and Turkish. The results of the study show that expanding training size with new instances taken from another corpus does not necessarily increase classification accuracy. And this happens primarily not due to (not always accurate) machine translation, but because of the inherent differences in corpora between two subsets written in different languages. Similarly, in case of co-training classification with machine translation we observe from the results that accuracy improvement can be explained by semi-supervised learning with unlabeled data coming from the same domain, but not due to cross-language co-training itself. Our results also show that amount of artificial noise added by machine translation services does not hinder classifiers much in polarity detection task. However, it is important to distinguish the effect of machine translation from the effect of merging different cross-lingual data sources and that like in case of transfer learning we may need to search for ways to account for cross-lingual data distribution differences.

[1]  Maite Taboada,et al.  Cross-Linguistic Sentiment Analysis: From English to Spanish , 2009, RANLP.

[2]  Mykola Pechenizkiy,et al.  SentiCorr: Multilingual Sentiment Analysis of Personal Correspondence , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[3]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[4]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[5]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[6]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[7]  Kevin Duh,et al.  Is Machine Translation Ripe for Cross-Lingual Sentiment Classification? , 2011, ACL.

[8]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[9]  Mykola Pechenizkiy,et al.  RBEM: a rule based approach to polarity detection , 2013, WISDOM '13.

[10]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[11]  Josef Steinberger,et al.  Creating Sentiment Dictionaries via Triangulation , 2011, Decis. Support Syst..

[12]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[13]  Rada Mihalcea,et al.  Multilingual Sentiment and Subjectivity Analysis , 2011 .

[14]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[15]  Alexandra Balahur,et al.  Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis , 2014, Comput. Speech Lang..

[16]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.