Aspect based Sentiment Analysis in Hindi: Resource Creation and Evaluation

Due to the phenomenal growth of online product reviews, sentiment analysis (SA) has gained huge attention, for example, by online service providers. A number of benchmark datasets for a wide range of domains have been made available for sentiment analysis, especially in resource-rich languages. In this paper we assess the challenges of SA in Hindi by providing a benchmark setup, where we create an annotated dataset of high quality, build machine learning models for sentiment analysis in order to show the effective usage of the dataset, and finally make the resource available to the community for further advancement of research. The dataset comprises of Hindi product reviews crawled from various online sources. Each sentence of the review is annotated with aspect term and its associated sentiment. As classification algorithms we use Conditional Random Filed (CRF) and Support Vector Machine (SVM) for aspect term extraction and sentiment analysis, respectively. Evaluation results show the average F-measure of 41.07% for aspect term extraction and accuracy of 54.05% for sentiment classification.

[1]  Pushpak Bhattacharyya,et al.  Harnessing WordNet Senses for Supervised Sentiment Classification , 2011, EMNLP.

[2]  Joachim Wagner,et al.  DCU: Aspect-based Polarity Classification for SemEval Task 4 , 2014, *SEMEVAL.

[3]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[4]  Vasudeva Varma,et al.  Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification , 2012, LREC.

[5]  Namita Mittal,et al.  Sentiment Analysis of Hindi Reviews based on Negation and Discourse Relation , 2013 .

[6]  Zhiqiang Toh,et al.  DLIREC: Aspect Term Extraction and Term Polarity Classification System , 2014, *SEMEVAL.

[7]  Roberto Basili,et al.  UNITOR: Aspect Based Sentiment Analysis with Structured Learning , 2014, *SEMEVAL.

[8]  Pushpak Bhattacharyya,et al.  Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets , 2012, COLING.

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Deepak Kumar Gupta,et al.  PSO-ASent: Feature Selection Using Particle Swarm Optimization for Aspect Based Sentiment Analysis , 2015, NLDB.

[11]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Richa Sharma,et al.  Polarity detection movie reviews in hindi language , 2014, ArXiv.

[14]  Dipankar Das,et al.  Labeling Emotion in Bengali Blog Corpus – A Fine Grained Tagging at Sentence Level , 2010 .

[15]  Pushpak Bhattacharyya,et al.  A Fall-back Strategy for Sentiment Analysis in Hindi: a Case Study , 2010 .

[16]  Maryna Chernyshevich,et al.  IHS R&D Belarus: Cross-domain Extraction of Product Features using Conditional Random Fields , 2014 .

[17]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[18]  Sivaji Bandyopadhyay,et al.  Phrase-level Polarity Identification for Bangla , 2010, Int. J. Comput. Linguistics Appl..

[19]  Sivaji Bandyopadhyay,et al.  Sentiment analysis: what is the end user's requirement? , 2012, WIMS '12.

[20]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).