A Comparative Study of Feature Selection and Machine Learning Methods for Sentiment Classification on Movie Data Set

Sentiment analysis has become a leading research domain with the advent of Web 2.0 where Web users express their opinions in user forums, blogs, discussion boards, and review sites. The online information is considered to be a valuable source for decision making, improving the quality of service, and helping the service providers to enhance their competitiveness. Since the processing of high-dimensional text data is not scalable, different feature selection mechanisms are being used to confine the study to only most informative features. These features are then used to train the classifier to improve the accuracy of sentiment-based classification. This paper explores six feature selection mechanisms (IG, GR, CHI, OneR, Relief-F, and SAE) with five different machine learning classifiers (SVM, NB, DT, K-NN, and ME) thereby providing Accuracy, on the movie review data set for each. Comparative results show that Naive Bayes (NB) outperforms other classifiers and works better for Gain Ratio (GR) and Significance Attribute Evaluation (SAE) feature selection method.

[1]  Marie-Francine Moens,et al.  Automatic Sentiment Analysis in On-line Text , 2007, ELPUB.

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[4]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[7]  Lipika Dey,et al.  A feature selection technique for classificatory analysis , 2005, Pattern Recognit. Lett..

[8]  Elizabeth Blakesley Lindsay,et al.  The Internet Movie Database (IMDb) , 2013 .

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[11]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[12]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[13]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[14]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[15]  Abdul Razak Hamdan,et al.  Immune based feature selection for opinion mining , 2013 .

[16]  Deyu Li,et al.  A feature selection method based on improved fisher's discriminant ratio for text sentiment classification , 2011, Expert Syst. Appl..

[17]  Shubhamoy Dey,et al.  A comparative study of feature selection and machine learning techniques for sentiment analysis , 2012, RACS.

[18]  Vincent Ng,et al.  Topic-wise, Sentiment-wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification , 2009, EMNLP.

[19]  Grzegorz Kondrak,et al.  A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs , 2008, Canadian Conference on AI.

[20]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..