Resource creation for opinion mining: a case study with Marathi movie reviews

With rapid growth in user generated contents on the Web, various NLP research areas are emerging to utilize this information in ways that will facilitate users to manipulate the data efficiently. Opinion mining is one such area of research gaining interest among researchers to develop automated NLP systems that will be able to analyze sentiments expressed in natural languages. Being language and domain dependent task, the opinion mining systems require language specific resources for better results. Several studies on this theme have been presented using number of techniques, most of which focus mainly on English. The essential resources like corpus, lexicon, parsers, etc. are scarce for resource poor languages. In this paper, we present our experiments on construction of opinion corpus and sentiment lexicon that will be used for mining opinions from Marathi language text. The corpus is constructed using review documents from one of the popular opinion mining domains, i.e. movie reviews. Different experiments have been carried out to validate the resources. The lexicon based document level polarity classification system attained F-measure of 0.75 and 0.56 for positive and negative classes respectively. The results encourage us to continue the line of research with further attempts in resources and system improvements.

[1]  Rahim Dehkharghani SentiFars: A Persian Polarity Lexicon for Sentiment Analysis , 2020, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[2]  Andreas Henrich,et al.  Sentiment phrase generation using statistical methods , 2018, SAC.

[3]  Björn W. Schuller,et al.  Learning and Knowledge-Based Sentiment Analysis in Movie Review Key Excerpts , 2010, COST 2102 Training School.

[4]  A. Elnagar,et al.  Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications , 2018 .

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  John P. McCrae,et al.  Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text , 2020, SLTU.

[7]  Namita Mittal,et al.  Concept-Level Sentiment Analysis with Dependency-Based Semantic Parsing: A Novel Approach , 2015, Cognitive Computation.

[8]  Francisco Chiclana,et al.  ArAutoSenti: automatic annotation and new tendencies for sentiment classification of Arabic messages , 2020, Social Network Analysis and Mining.

[9]  Harshali B. Patil,et al.  MarS: A rule-based stemmer for morphologically rich language Marathi , 2017, 2017 International Conference on Computer, Communications and Electronics (Comptelix).

[10]  B. V. Pawar,et al.  Named Entity Recognition using Conditional Random Fields , 2020, Procedia Computer Science.

[11]  Parteek Kumar,et al.  Deep Learning Based Sentiment Analysis Using Convolution Neural Network , 2018, Arabian Journal for Science and Engineering.

[12]  P. Waila,et al.  Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification , 2013, 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s).

[13]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[14]  Jeremy Barnes,et al.  MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification , 2018, LREC.

[15]  Serge Sharoff,et al.  SentiML: functional annotation for multilingual sentiment analysis , 2013, DH-CASE '13.

[16]  Iryna Gurevych,et al.  Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields , 2010, EMNLP.

[17]  Bhaskar Biswas,et al.  Sentiment analysis of movie reviews: finding most important movie aspects using driving factors , 2015, Soft Computing.

[18]  Radhika Mamidi,et al.  ACTSA: Annotated Corpus for Telugu Sentiment Analysis , 2017 .

[19]  Walter Willinger,et al.  Examining the evolution of the Twitter elite network , 2019, Social Network Analysis and Mining.

[20]  Sivaji Bandyopadhyay,et al.  Subjectivity Detection in English and Bengali: A CRF-based Approach , 2009 .

[21]  Radhika Mamidi,et al.  Resource Creation Towards Automated Sentiment Analysis in Telugu (a low resource language) and Integrating Multiple Domain Sources to Enhance Sentiment Prediction , 2018, LREC.

[22]  Matthieu Vernier,et al.  Annotating opinion—evaluation of blogs: the Blogoscopy corpus , 2011, Lang. Resour. Evaluation.

[23]  Amir Hussain,et al.  SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis , 2018, BICS.

[24]  Pushpak Bhattacharyya,et al.  A Fall-back Strategy for Sentiment Analysis in Hindi: a Case Study , 2010 .