AMRITA_CEN-NLP@SAIL2015: Sentiment Analysis in Indian Language Using Regularized Least Square Approach with Randomized Feature Learning

The present work is done as part of shared task in Sentiment Analysis in Indian Languages SAIL 2015, under constrained category. The task is to classify the twitter data into three polarity categories such as positive, negative and neutral. For training, twitter dataset under three languages were provided Hindi, Bengali and Tamil. In this shared task, ours is the only team who participated in all the three languages. Each dataset contained three separate categories of twitter data namely positive, negative and neutral. The proposed method used binary features, statistical features generated from SentiWordNet, and word presence binary feature. Due to the sparse nature of the generated features, the input features were mapped to a random Fourier feature space to get a separation and performed a linear classification using regularized least square method. The proposed method identified more negative tweets in the test data provided Hindi and Bengali language. In test tweet for Tamil language, positive tweets were identified more than other two polarity categories. Due to the lack of language specific features and sentiment oriented features, the tweets under neutral were less identified and also caused misclassifications in all the three polarity categories. This motivates to take forward our research in this area with the proposed method.

[1]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[2]  N. Mohandas,et al.  Domain Specific Sentence Level Mood Extraction from Malayalam Text , 2012, 2012 International Conference on Advances in Computing and Communications.

[3]  Pushpak Bhattacharyya,et al.  A Fall-back Strategy for Sentiment Analysis in Hindi: a Case Study , 2010 .

[4]  Namita Mittal,et al.  Sentiment Analysis of Hindi Reviews based on Negation and Discourse Relation , 2013 .

[5]  Anu Sharma,et al.  Sentiment Analyzer using Punjabi Language , 2014 .

[6]  Jacques Savoy,et al.  Feature Selection in Sentiment Analysis , 2012, CORIA.

[7]  K. P. Soman,et al.  Sentiment analysis of tamil movie reviews via feature frequency count , 2015 .

[8]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[9]  Pushpak Bhattacharyya,et al.  Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets , 2012, COLING.

[10]  Lorenzo Rosasco,et al.  GURLS: a Toolbox for Regularized Least Squares Learning , 2012 .

[11]  Rada Mihalcea,et al.  Word Sense and Subjectivity , 2006, ACL.

[12]  Gunjan Ansari,et al.  Sentiment Analysis in Hindi Language : A Survey , 2014 .

[13]  Elizabeth Sherly,et al.  SentiMa - Sentiment extraction for Malayalam , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[14]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[15]  Richa Sharma,et al.  Opinion Mining In Hindi Language: A Survey , 2014, FOCS 2014.

[16]  Leandro Nunes de Castro,et al.  A keyword extraction method from twitter messages represented as graphs , 2014, Appl. Math. Comput..

[17]  Sharvari Govilkar,et al.  A Framework for Sentiment Analysis in Hindi using HSWN , 2015 .

[18]  Maite Taboada,et al.  Methods for Creating Semantic Orientation Dictionaries , 2006, LREC.

[19]  Sivaji Bandyopadhyay,et al.  VERB BASED MANIPURI SENTIMENT ANALYSIS , 2014 .

[20]  Takashi Inui,et al.  Extracting Semantic Orientations of Words using Spin Model , 2005, ACL.

[21]  Irfan A. Essa,et al.  Beyond Sentiment: The Manifold of Human Emotions , 2012, AISTATS.

[22]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[23]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[24]  Hsin-Hsi Chen,et al.  Building Emotion Lexicon from Weblog Corpora , 2007, ACL.

[25]  Sivaji Bandyopadhyay,et al.  SentiWordNet for Indian Languages , 2010 .

[26]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.