An Experimental Study on Sentiment Classification of Algerian Dialect Texts

Abstract The aim of this paper is to study and compare some well-known and commonly used methods for sentiment analysis to evaluate the opinion and emotion expressed in Algerian texts. The classification task herein is a ternary sentiment classification. By using several combinations of text preprocessing and data representation techniques, we aim to compare the precise modelling results of Deep Learning models with other commonly used algorithms (random forest, maximum entropy, SVM, and the lexicon-based method for which we tested several lexicons). Based on the experiments carried out, Deep Learning models clearly outperform the baseline and offer better accuracy especially for CNN. In order to improve modelling results, we set a new baseline for future works. This is the integrated embeddings in the training model. We experimented with different models and data representations, including a recent approach, the ” contextual embedding” which appeared in 2018 and gained popularity in the NLP community in 2019. Our results give openings for further research in this domain.

[1]  Muhammad Abdul-Mageed,et al.  SAMAR: Subjectivity and sentiment analysis for Arabic social media , 2014, Comput. Speech Lang..

[2]  M'hamed Mataoui,et al.  A Proposed Lexicon-Based Sentiment Analysis Approach for the Vernacular Algerian Arabic , 2016, Res. Comput. Sci..

[3]  Mahmoud Al-Ayyoub,et al.  Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels' reviews , 2017, J. Comput. Sci..

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Muhammad Abdul-Mageed,et al.  AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis , 2012, LREC.

[6]  Salwani Abdullah,et al.  Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis , 2018, J. Inf. Sci..

[7]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[8]  Samhaa R. El-Beltagy,et al.  AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP , 2017, ACLING.

[9]  Nizar Habash,et al.  A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[10]  Karima Akli-Astouati,et al.  Construction et exploitation d'un corpus multilingue algérien pour l'analyse des opinions et des émotions , 2019, EGC.

[11]  Ayoub Ait Lahcen,et al.  ASA: A framework for Arabic sentiment analysis , 2020, J. Inf. Sci..

[12]  Mahmoud Al-Ayyoub,et al.  Lexicon-based sentiment analysis of Arabic tweets , 2015, Int. J. Soc. Netw. Min..

[13]  Luis Alfonso Ureña López,et al.  OCA: Opinion corpus for Arabic , 2011, J. Assoc. Inf. Sci. Technol..