Improving Sentiment Analysis of Arabic Tweets by One-way ANOVA

Abstract Social media is an indispensable necessity for modern life. As a result, it is full of people’s opinions, emotions, ideas, and attitudes, whether positive or negative. This abundance of views creates many opportunities for applying sentiment analysis to the education sector, which reflects how countries and cultures develop. In this research, a real-world Twitter dataset was collected, containing approximately 8,144 tweets related to one of the Saudi universities. The main aim of this experimental study was to explore the possibility of using a one-way analysis of variance (ANOVA) as a feature selection method to considerably reduce the number of features when classifying opinions conveyed through Arabic tweets. The primary motivation for this research was that no previous studies had examined one-way ANOVA comprehensively to tackle the curse of dimensionality and to enhance classification performance in sentiment analysis for Arabic tweets. Therefore, various experiments were conducted to investigate the effects of one-way ANOVA and to select important features concerning the performance of different supervised machine learning classifiers. Support Vector Machine and Naive Bayes achieved the best results with one-way ANOVA as compared to the baseline experimental results in the collected dataset. Furthermore, the differences between all results have been statistically analyzed in this study. As further evidence, one-way ANOVA with Support Vector Machine represented an excellent combination across different Arabic benchmark datasets, with its results outperforming other studies.

[1]  Jacques Wainer,et al.  Nested cross-validation when selecting classifiers is overzealous for most practical applications , 2018, Expert Syst. Appl..

[2]  Svante Wold,et al.  Analysis of variance (ANOVA) , 1989 .

[3]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  Fabio Crestani,et al.  Like It or Not , 2016, ACM Comput. Surv..

[5]  Jinwoo Park,et al.  Improving text categorization using the importance of sentences , 2004, Inf. Process. Manag..

[6]  Hamdy M. Mousa,et al.  Improving Arabic Text Categorization using Normalization and Stemming Techniques , 2016 .

[7]  Son Doan,et al.  An efficient feature selection using multi-criteria in text categorization , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[8]  Raddouane Chiheb,et al.  Sentiment analysis in Arabic: A review of the literature , 2017, Ain Shams Engineering Journal.

[9]  Hazlina Hamdan,et al.  Narrow Convolutional Neural Network for Arabic Dialects Polarity Classification , 2019, IEEE Access.

[10]  Fouzi Harrag,et al.  Improving arabic text categorization using decision trees , 2009, 2009 First International Conference on Networked Digital Technologies.

[11]  Mahmoud Al-Ayyoub,et al.  Automatic Lexicon Construction for Arabic Sentiment Analysis , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[12]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[13]  Mohammad Subhi Al-Batah,et al.  Arabic Sentiment Classification using MLP Network Hybrid with Naive Bayes Algorithm , 2018 .

[14]  Riyad Al-Shalabi,et al.  Different Classification Algorithms Based on Arabic Text Classification: Feature Selection Comparative Study , 2015 .

[15]  Dewan Md. Farid,et al.  Literature Review of Feature Selection for Mining Tasks , 2015 .

[16]  Mohammad Subhi Al-Batah,et al.  Investigation of Naive Bayes Combined with Multilayer Perceptron for Arabic Sentiment Analysis and Opinion Mining , 2018 .

[17]  Xin Chen,et al.  Mining Social Media Data for Understanding Students’ Learning Experiences , 2014, IEEE Transactions on Learning Technologies.

[18]  Yazala Ritika Siril Paul Sentiment Analysis of Tweets at Sentence Level Using Hadoop , 2018 .

[19]  Ahmed Hamza Osman,et al.  A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification , 2014 .

[20]  Dayou Li,et al.  Sentiment Analysis of Arabic Tweets in e-Learning , 2016, J. Comput. Sci..

[21]  Shadi Aljawarneh,et al.  An Efficient Feature Selection Method for Arabic Text Classification , 2013 .

[22]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[23]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[24]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[25]  James H. Jones,et al.  Improved Micro-Blog Classification for Detecting Abusive Arabic Twitter Accounts , 2016 .

[26]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[27]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[28]  Markus Vincze,et al.  Using Dimension Reduction to Improve the Classification of High-dimensional Data , 2015, ArXiv.

[29]  Ahmed Emam,et al.  Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis , 2018, J. Inf. Sci..

[30]  Karen Mite-Baidal,et al.  Sentiment Analysis in Education Domain: A Systematic Literature Review , 2018, CITI.