Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers ' Reviews

Sentiment Analysis is a very challenging and important task that contains natural language processing, web mining and machine learning. Up to date, few researches have been conducted on sentiment classification for Arabic languages due to the lack of resources for managing sentiments or opinions such as senti-lexicons and opinion corpora. The main obstacle in Arabic sentiment analysis is that phrases and words that are used by Arabic web users to express sentiments are highly subjected to usage trends. In addition, the use of dialectal phrases and words contributes to ambiguity in the analysis of Arabic sentiments and opinions. To antidote this shortage, this study proposes an ensemble of machine learning classifiers framework for handling the problem of subjectivity and sentiment analysis for Arabic customer reviews. First of all, three renowned text classification algorithms, called Naive Bayes, Rocchio classifier and support vector machines, are adopted as base-classifiers. Second, we make a comparative study of two kinds of ensemble methods, namely the fixed combination and meta-classifier combination. The experimental results show that the ensemble of the classifiers improves the classification effectiveness in terms of macro-F1 for both levels. The best results obtained for the subjectivity analysis and the sentiment classification in terms of macro-F1 are 97.13% and 90.95% respectively.

[1]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Arabic: A Survey , 2012, AMLTA.

[2]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[3]  Nazlia Omar,et al.  Developing a Competitive HMM Arabic POS Tagger Using Small Training Corpora , 2011, ACIIDS.

[4]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[5]  Rui Xia,et al.  A POS-based Ensemble Model for Cross-domain Sentiment Classification , 2011, IJCNLP.

[6]  N. Omar,et al.  Automatic Kurdish Sorani text categorization using N-gram based model , 2012, 2012 International Conference on Computer & Information Science (ICCIS).

[7]  Oi Yee Kwong,et al.  Supervised Approaches and Ensemble Techniques for Chinese Opinion Analysis at NTCIR-7 , 2008, NTCIR.

[8]  Hend Suliman Al-Khalifa,et al.  A proposed sentiment analysis tool for modern Arabic using human-based computing , 2011, iiWAS '11.

[9]  Mohamed G. Elfeky,et al.  Mining Arabic Business Reviews , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[10]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[11]  Zhong Luo,et al.  Naive Bayesian Text Classifier Based on Different Probability Model , 2012 .

[12]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.

[13]  Luis Alfonso Ureña López,et al.  OCA: Opinion corpus for Arabic , 2011, J. Assoc. Inf. Sci. Technol..

[14]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[15]  Zhao Dapeng Research on the Vector Space Model Based Text Automatic Classification System , 2013 .

[16]  Donald J. Berndt,et al.  Using Ensemble Models to Classify the Sentiment Expressed in Suicide Notes , 2012, Biomedical informatics insights.

[17]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[18]  Weili Wang,et al.  Fine-Grained Sentiment Classification based on HowNet , 2012 .

[19]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[20]  Muhammad Abdul-Mageed,et al.  SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media , 2012, WASSA@ACL.

[21]  Judy Kay,et al.  A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization , 2002, PRICAI.