Sentiment Analysis on Movie Reviews Using Ensemble Features and Pearson Correlation Based Feature Selection

Microblogging has become the media information that is very popular among internet users. Therefore, the microblogging became a source of rich data for opinions and reviews especially on movie reviews. We proposed, sentiment analysis on movie review using ensemble features and Bag of Words and selection Features Pearson's Correlation to reduce the dimension of the feature and get the optimal feature combinations. Use the feature selection is done to improve the performance of the classification, reducing the dimension of the feature and get the optimal feature combinations. The process of classification using several models of Naïve Bayes i.e. Bernoulli Naïve Bayes for binary data, Gaussian Naïve Bayes for continuous data and Multinomial Naïve Bayes for numeric data. The results of this study indicate that by using the non-standard word on tweet evaluation results obtained accuracy 82%, precision 86%, recall 79.62% and f-measure 82.69% using Feature Selection 20%. Then after using manual standardization of word the evaluation results on the accuracy increased by 8% and then the accuracy becomes 90%, precision 92%, recall 88.46% and f-measure 90.19% using 85% feature selection. Based on these results it can be concluded that by using the standardization of word can improve the performance of classification and feature selection Pearson's provide optimal feature combinations and reducing the total number of dimensions’ feature.

[1]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[2]  Agus Zainal Arifin,et al.  Indonesian News Classification Using Naïve Bayes and Two-Phase Feature Selection Model , 2017 .

[3]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[4]  Óscar W. Márquez Flórez,et al.  A Communication Perspective on Automatic Text Categorization , 2009, IEEE Transactions on Knowledge and Data Engineering.

[5]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[6]  Aytug Onan,et al.  A feature selection model based on genetic rank aggregation for text sentiment classification , 2017, J. Inf. Sci..

[7]  Estevam R. Hruschka,et al.  Tweet sentiment analysis with classifier ensembles , 2014, Decis. Support Syst..

[8]  Tanveer Ahsan,et al.  Combining a rule-based classifier with ensemble of feature sets and machine learning techniques for sentiment analysis on microblog , 2016, 2016 19th International Conference on Computer and Information Technology (ICCIT).

[9]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[10]  M. Ali Fauzi,et al.  Twitter Sentiment Analysis on 2013 Curriculum Using Ensemble Features and K-Nearest Neighbor , 2018, International Journal of Electrical and Computer Engineering (IJECE).

[11]  M. Ali Fauzi,et al.  Improving Sentiment Analysis of Short Informal Indonesian Product Reviews using Synonym Based Feature Expansion , 2018 .

[12]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.