A Fall-back Strategy for Sentiment Analysis in Hindi: a Case Study

Sentiment Analysis (SA) research has gained tremendous momentum in recent times. However, there has been little work in this area for an Indian language. We propose in this paper a fall-back strategy to do sentiment analysis for Hindi documents, a problem on which, to the best of our knowledge, no work has been done until now. (A) First of all, we study three approaches to perform SA in Hindi. We have developed a sentiment annotated corpora in the Hindi movie review domain. The first of our approaches involves training a classifier on this annotated Hindi corpus and using it to classify a new Hindi document. (B) In the second approach, we translate the given document into English and use a classifier trained on standard English movie reviews to classify the document. (C) In the third approach, we develop a lexical resource called Hindi-SentiWordNet (H-SWN) and implement a majority score based strategy to classify the given document.

[1]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[2]  Sivaji Bandyopadhyay,et al.  Subjectivity Detection in English and Bengali: A CRF-based Approach , 2009 .

[3]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[4]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[5]  Pushpak Bhattacharyya,et al.  Verbs are where all the action lies: Experiences of Shallow Parsing of a Morphologically Rich Language , 2010, COLING.

[6]  Khurshid Ahmad,et al.  Multi-lingual Sentiment Analysis of Financial News Streams , 2007 .

[7]  Kerstin Denecke,et al.  Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[8]  Rada Mihalcea,et al.  A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources , 2008, LREC.

[9]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[10]  Pushpak Bhattacharyya,et al.  Incorporating Semantic Knowledge for Sentiment Analysis , 2008 .

[11]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[12]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[13]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[14]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[15]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[16]  François-Régis Chaumartin,et al.  UPAR7: A knowledge-based system for headline sentiment tagging , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[17]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.