Capturing user sentiments for online Indian movie reviews: A comparative analysis of different machine-learning models

Purpose Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an Indian movie review corpus using natural language processing and various machine learning classifiers. Design/methodology/approach In this paper, a comparative study between three machine learning classifiers (Bayesian, naive Bayesian and support vector machine [SVM]) was performed. All the classifiers were trained on the words/features of the corpus extracted, using five different feature selection algorithms (Chi-square, info-gain, gain ratio, one-R and relief-F [RF] attributes), and a comparative study was performed between them. The classifiers and feature selection approaches were evaluated using different metrics (F-value, false-positive [FP] rate and training time). Findings The results of this study show that, for the maximum number of features, the RF feature selection approach was found to be the best, with better F-values, a low FP rate and less time needed to train the classifiers, whereas for the least number of features, one-R was better than RF. When the evaluation was performed for machine learning classifiers, SVM was found to be superior, although the Bayesian classifier was comparable with SVM. Originality/value This is a novel research where Indian review data were collected and then a classification model for sentiment polarity (positive/negative) was constructed.

[1]  Christopher S. G. Khoo,et al.  Aspect-based sentiment analysis of movie reviews on discussion boards , 2010, J. Inf. Sci..

[2]  Rashmi Thakur,et al.  Enhancement of Marketing Strategies using Weighted Association Rule Mining , 2013 .

[3]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[4]  Grzegorz Kondrak,et al.  A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs , 2008, Canadian Conference on AI.

[5]  Shrawan Kumar Trivedi,et al.  A novel committee selection mechanism for combining classifiers to detect unsolicited emails , 2016 .

[6]  Shrawan Kumar Trivedi,et al.  A Comparative Study of Various Supervised Feature Selection Methods for Spam Classification , 2016, ICTCS.

[7]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[8]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[9]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[10]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[11]  Ellen Riloff,et al.  Sarcasm as Contrast between a Positive Sentiment and Negative Situation , 2013, EMNLP.

[12]  Khin Thidar Lynn,et al.  Extracting Product Features and Opinion Words Using Pattern Knowledge in Customer Reviews , 2013, TheScientificWorldJournal.

[13]  Shrawan Kumar Trivedi,et al.  Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails , 2014, SIAP.

[14]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[15]  Shrawan Kumar Trivedi,et al.  Effect of feature selection methods on machine learning classifiers for detecting email spams , 2013, RACS.

[16]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[17]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[18]  Bhaskar Biswas,et al.  Sentiment analysis of movie reviews: finding most important movie aspects using driving factors , 2015, Soft Computing.

[19]  Shrawan Kumar Trivedi,et al.  An Enhanced Genetic Programming Approach for Detecting Unsolicited Emails , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[20]  Elisabetta Fersini,et al.  Sentiment analysis: Bayesian Ensemble Learning , 2014, Decis. Support Syst..

[21]  Li Su,et al.  Triassic Subduction of the Paleo-Tethys in northern Tibet, China: Evidence from the geochemical and isotopic characteristics of eclogites and blueschists of the Qiangtang Block , 2011 .

[22]  Yu-N Cheah,et al.  Exploiting sequential patterns to detect objective aspects from online reviews , 2016, 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA).

[23]  P. Deepa Shenoy,et al.  Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier , 2016, World Wide Web.

[24]  Chaomei Chen,et al.  Visual Analysis of Conflicting Opinions , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[25]  Michael Gamon,et al.  Customizing Sentiment Classifiers to New Domains: a Case Study , 2019 .

[26]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[27]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[28]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[29]  Michael Gamon,et al.  Linguistic correlates of style: authorship classification with deep linguistic analysis features , 2004, COLING.

[30]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[31]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[32]  Shrawan Kumar Trivedi,et al.  Interplay between Probabilistic Classifiers and Boosting Algorithms for Detecting Complex Unsolicited Emails , 2013 .

[33]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[34]  David Jacot,et al.  Sentiment Analysis of French Movie Reviews , 2011, Advances in Distributed Agent-Based Retrieval Tools.

[35]  Shrawan Kumar Trivedi,et al.  A Combining Classifiers Approach for Detecting Email Spams , 2016, 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA).

[36]  Shrawan Kumar Trivedi,et al.  Effect of Various Kernels and Feature Selection Methods on SVM Performance for Detecting Email Spams , 2013 .

[37]  P. Waila,et al.  Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification , 2013, 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s).

[38]  Kavita Asnani,et al.  Implicit Aspect Identification Techniques for Mining Opinions: A Survey , 2014 .

[39]  K. K. Nisha,et al.  An improved sentiment analysis of online movie reviews based on clustering for box-office prediction , 2015, International Conference on Computing, Communication & Automation.

[40]  Shrawan Kumar Trivedi A study of machine learning classifiers for spam detection , 2016, 2016 4th International Symposium on Computational and Business Intelligence (ISCBI).

[41]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[42]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[43]  Khairullah Khan,et al.  Identifying product features from customer reviews using hybrid patterns , 2014, Int. Arab J. Inf. Technol..

[44]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[45]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[46]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[47]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[48]  Masayu Leylia Khodra,et al.  Aspect Extraction in Customer Reviews Using Syntactic Pattern , 2015 .

[49]  Marie-Francine Moens,et al.  Automatic Sentiment Analysis in On-line Text , 2007, ELPUB.

[50]  C. Osgood,et al.  The Measurement of Meaning [by] Charles E. Osgood, George J. Suci [and] Percy H. Tannenbaum , 1964 .

[51]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[52]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..