Lost in Translations? Building Sentiment Lexicons using Context Based Machine Translation

In this paper, we propose a simple yet efective approach to au tomatically building sentiment lexicons from English sentiment lexicons using publi cly available online machine translation services. The method does not rely on any semanti c resources or bilingual dictionaries, and can be applied to many languages. We propos e to overcome the low coverage problem through putting each English sentiment wor d into diferent contexts to generate diferent phrases, which efectively prompts the m achine translation engine to return diferent translations for the same English sentimen t word. Experiment results on building a Chinese sentiment lexicon (available at https:// github.com/fannix/ChineseSentiment-Lexicon) show that the proposed approach signiic antly improves the coverage of the sentiment lexicon while achieving relatively high pr ecision.

[1]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[2]  Delip Rao,et al.  Semi-Supervised Polarity Lexicon Induction , 2009, EACL.

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Xiaolong Li,et al.  An Overview of Microsoft Web N-gram Corpus and Applications , 2010, NAACL.

[5]  Hsin-Hsi Chen,et al.  Mining opinions from the Web: Beyond relevance retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[6]  Hsin-Hsi Chen,et al.  Overview of Opinion Analysis Pilot Task at NTCIR-6 , 2007, NTCIR.

[7]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[8]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[9]  Xiaojun Wan,et al.  Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis , 2008, EMNLP.

[10]  Sasha Blair-Goldensohn,et al.  The viability of web-derived polarity lexicons , 2010, NAACL.

[11]  Dragomir R. Radev,et al.  Identifying the Semantic Orientation of Foreign Words , 2011, ACL.

[12]  Houfeng Wang,et al.  Build Chinese Emotion Lexicons Using A Graph-based Algorithm and Multiple Resources , 2010, COLING.

[13]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[14]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[15]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[16]  WanXiaojun Bilingual co-training for sentiment classification of chinese product reviews , 2011 .

[17]  Hsin-Hsi Chen,et al.  Overview of Multilingual Opinion Analysis Task at NTCIR-7 , 2008, NTCIR.