Tunisian Dialect Sentiment Analysis: A Natural Language Processing-based Approach

Social media platforms have been witnessing a significant increase in posts written in the Tunisian dialect since the uprising in Tunisia at the end of 2010. Most of the posted tweets or comments reflect the impressions of the Tunisian public towards social, economical and political major events. These opinions have been tracked, analyzed and evaluated through sentiment analysis systems. In the current study, we investigate the impact of several preprocessing techniques on sentiment analysis using two sentiment classification models: Supervised and lexicon-based. These models were trained on three Tunisian datasets of different sizes and multiple domains. Our results emphasize the positive impact of preprocessing phase on the evaluation measures of both sentiment classifiers as the baseline was significantly outperformed when stemming, emoji recognition and negation detection tasks were applied. Moreover, integrating named entities with these tasks enhanced the lexicon-based classification performance in all datasets and that of the supervised model in medium and small sized datasets.

[1]  Nadir Durrani,et al.  Farasa: A Fast and Furious Segmenter for Arabic , 2016, NAACL.

[2]  Jalel Akaichi,et al.  Sentiment Classification at the Time of the Tunisian Uprising: Machine Learning Techniques Applied to a New Corpus for Arabic Language , 2014, 2014 European Network Intelligence Conference.

[3]  Mourad Gridach,et al.  Character-Aware Neural Networks for Arabic Named Entity Recognition for Social Media , 2016, WSSANLP@COLING.

[4]  Luis Alfonso Ureña López,et al.  OCA: Opinion corpus for Arabic , 2011, J. Assoc. Inf. Sci. Technol..

[5]  Hatem Haddad,et al.  Polarity analysis of non figurative tweets : Tw-StAR participation on DEFT 2017 , 2017 .

[6]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[7]  Laura Kallmeyer,et al.  A Neural Architecture for Dialectal Arabic Segmentation , 2017, WANLP@EACL.

[8]  Hatem Haddad,et al.  Tw-StAR at SemEval-2017 Task 4: Sentiment Classification of Arabic Tweets , 2017, SemEval@ACL.

[9]  Amir F. Atiya,et al.  LABR: A Large Scale Arabic Book Reviews Dataset , 2013, ACL.

[10]  Fethi Bougares,et al.  Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments , 2017, WANLP@EACL.

[11]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[12]  Rehab Duwairi,et al.  A study of the effects of preprocessing strategies on sentiment analysis for Arabic text , 2014, J. Inf. Sci..

[13]  Samhaa R. El-Beltagy,et al.  NileTMRG at SemEval-2017 Task 4: Arabic Sentiment Analysis , 2017, *SEMEVAL.

[14]  Naphtali Rishe,et al.  Sentiment Analysis Using Dependency Trees and Named-Entities , 2014, FLAIRS Conference.

[15]  Shervin Malmasi,et al.  Arabic Dialect Identification Using a Parallel Multidialectal Corpus , 2015, PACLING.

[16]  Madhavi Devaraj,et al.  Analytical mapping of opinion mining and sentiment analysis research during 2000-2015 , 2017, Inf. Process. Manag..

[17]  Abdelkamel Tari,et al.  Data and Text Mining Techniques for Classifying Arabic Tweet Polarity , 2016 .