Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets

Cross-Lingual Sentiment Analysis (CLSA) is the task of predicting the polarity of the opinion expressed in a text in a language Ltest using a classifier trained on the corpus of another language Lt rain. Popular approaches use Machine Translation (MT) to convert the test document in Ltest to Lt rain and use the classifier of Lt rain. However, MT systems do not exist for most pairs of languages and even if they do, their translation accuracy is low. So we present an alternative approach to CLSA using WordNet senses as features for supervised sentiment classification. A document in Ltest is tested for polarity through a classifier trained on sense marked and polarity labeled corpora of Lt rain. The crux of the idea is to use the linked WordNets of two languages to bridge the language gap. We report our results on two widely spoken Indian languages, Hindi (450 million speakers) and Marathi (72 million speakers), which do not have an MT system between them. The sense-based approach gives a CLSA accuracy of 72% and 84% for Hindi and Marathi sentiment classification respectively. This is an improvement of 14%-15% over an approach that uses a bilingual dictionary.

[1]  Pushpak Bhattacharyya,et al.  IndoWordNet , 2010, LREC.

[2]  Kentaro Inui,et al.  Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables , 2010, NAACL.

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Maite Taboada,et al.  Cross-Linguistic Sentiment Analysis: From English to Spanish , 2009, RANLP.

[5]  Hsin-Hsi Chen,et al.  Overview of Multilingual Opinion Analysis Task at NTCIR-7 , 2008, NTCIR.

[6]  Pushpak Bhattacharyya,et al.  Domain-Specific Word Sense Disambiguation combining corpus based and wordnet based parameters , 2009 .

[7]  Pushpak Bhattacharyya,et al.  Harnessing WordNet Senses for Supervised Sentiment Classification , 2011, EMNLP.

[8]  Thomas Mandl,et al.  Multilingual Corpus Development for Opinion Mining , 2010, LREC.

[9]  Pushpak Bhattacharyya,et al.  Synset Based Multilingual Dictionary: Insights, Applications and Challenges , 2008 .

[10]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[11]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[12]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[13]  Christopher Joseph Pal,et al.  Cross Lingual Adaptation: An Experiment on Sentiment Classifications , 2010, ACL.

[14]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[15]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.