A novel sentiment aware dictionary for multi-domain sentiment classification

Abstract Sentiment Analysis is a sub area of Natural Language Processing (NLP) which extracts user’s opinion and classifies it according to its polarity. This task has many applications but it is domain dependent and a costly task to annotate the corpora in every possible domain of interest before training the classifier. We are making an attempt to solve this problem by creating a sentiment aware dictionary using multiple domain data. This dictionary is created using labeled data from the source domain and unlabeled data from both source and target domains. Next, this dictionary is used to classify the unlabeled reviews of the target domain. The work is carried out in Hindi, the official language of India. The web pages in Hindi language is booming after the introduction of UTF-8 encoding style. When compared with labeling done by Hindi Sentiwordnet (HSWN), a general lexicon for word polarity, the proposed method is able to label 23–24% more number of words of target domain. The labels assigned by our method and the labels given by HSWN, for the available words, are compared and found matching with 76% accuracy.

[1]  P. Deepa Shenoy,et al.  Sentiment Analysis in a Resource Scarce Language:Hindi , 2016 .

[2]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[3]  P. Deepa Shenoy,et al.  HSAS: Hindi Subjectivity Analysis System , 2015, 2015 Annual IEEE India Conference (INDICON).

[4]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[5]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[6]  Pushpak Bhattacharyya,et al.  Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets , 2012, COLING.

[7]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[8]  P. Deepa Shenoy,et al.  HSRA: Hindi stopword removal algorithm , 2016, 2016 International Conference on Microelectronics, Computing and Communications (MicroCom).

[9]  Kerstin Denecke,et al.  Are SentiWordNet scores suited for multi-domain sentiment classification? , 2009, 2009 Fourth International Conference on Digital Information Management.

[10]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[11]  Sivaji Bandyopadhyay,et al.  SentiWordNet for Indian Languages , 2010 .

[12]  Dennis McLeod,et al.  Extracting and Visualizing Trust Relationships from Online Auction Feedback Comments , 2007, IJCAI.

[13]  P. Deepa Shenoy,et al.  Generating Multilingual Subjectivity Resources using English Language , 2016 .

[14]  P. Deepa Shenoy,et al.  HMDSAD: Hindi multi-domain sentiment aware dictionary , 2015, 2015 International Conference on Computing and Network Communications (CoCoNet).

[15]  Danushka Bollegala,et al.  Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Jian-Tao Sun,et al.  Multi-domain active learning for text classification , 2012, KDD.

[17]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[18]  P. Deepa Shenoy,et al.  HOMS: Hindi opinion mining system , 2015, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS).

[19]  Luis Alfonso Ureña López,et al.  Experiments with SVM to classify opinions in different domains , 2011, Expert Syst. Appl..

[20]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[21]  Vandana Jha,et al.  Reputation System: Evaluating Reputation among All Good Sellers , 2016, NAACL 2016.

[22]  Rui Xia,et al.  Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification , 2013, IEEE Intelligent Systems.

[23]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.