An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis

In this paper, we address the lack of resources for opinion and emotion analysis related to North African dialects, targeting Algerian dialect. We present TWIFIL (TWItter proFILing) a collaborative annotation platform for crowdsourcing annotation of tweets at different levels of granularity. The plateform allowed the creation of the largest Algerian dialect dataset annotated for both sentiment (9,000 tweets), emotion (about 5,000 tweets) and extra-linguistic information including author profiling (age and gender). The annotation resulted also in the creation of the largest Algerien dialect subjectivity lexicon of about 9,000 entries which can constitute a valuable resources for the development of future NLP applications for Algerian dialect. To test the validity of the dataset, a set of deep learning experiments were conducted to classify a given tweet as positive, negative or neutral. We discuss our results and provide an error analysis to better identify classification errors.

[1]  Rehab M. Duwairi,et al.  Sentiment analysis for dialectical Arabic , 2015, 2015 6th International Conference on Information and Communication Systems (ICICS).

[2]  Jalal Omer Atoum,et al.  Sentiment Analysis of Arabic Jordanian Dialect Tweets , 2019, International Journal of Advanced Computer Science and Applications.

[3]  Saif Mohammad,et al.  SemEval-2018 Task 1: Affect in Tweets , 2018, *SEMEVAL.

[4]  Nizar Habash,et al.  A Conventional Orthography for Tunisian Arabic , 2014, LREC.

[5]  Nizar Habash,et al.  Building a Corpus for Palestinian Arabic: a Preliminary Study , 2014, ANLP@EMNLP.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Karima Meftouh,et al.  Building resources for Algerian Arabic dialects , 2014, INTERSPEECH.

[8]  Luis Alfonso Ureña López,et al.  OCA: Opinion corpus for Arabic , 2011, J. Assoc. Inf. Sci. Technol..

[9]  Matthew England,et al.  A Combined CNN and LSTM Model for Arabic Sentiment Analysis , 2018, CD-MAKE.

[10]  R. Plutchik Emotions : a general psychoevolutionary theory , 1984 .

[11]  Nizar Habash,et al.  A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[12]  Amir F. Atiya,et al.  ASTD: Arabic Sentiment Tweets Dataset , 2015, EMNLP.

[13]  Hatem Haddad,et al.  Syntax-Ignorant N-gram Embeddings for Sentiment Analysis of Arabic Dialects , 2019, WANLP@ACL 2019.

[14]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[15]  Nizar Habash,et al.  ArSEL: A Large Scale Arabic Sentiment and Emotion Lexicon , 2018 .

[16]  Ashraf Elnagar,et al.  An Annotated Huge Dataset for Standard and Colloquial Arabic Reviews for Subjective Sentiment Analysis , 2018, ACLING.

[17]  Nizar Habash,et al.  Conventional Orthography for Dialectal Arabic , 2012, LREC.

[18]  David Sankoff,et al.  A formal grammar for code‐switching 1 , 1981 .

[19]  Nizar Habash,et al.  A Conventional Orthography for Algerian Arabic , 2015, ANLP@ACL.

[20]  Karima Meftouh,et al.  A study of a non-resourced language: an Algerian dialect , 2012, SLTU.

[21]  Fadi Salem Social Media and the Internet of Things towards Data-Driven Policymaking in the Arab World: Potential, Limits and Concerns , 2017 .

[22]  Sarit Chakraborty,et al.  An Improved Text Sentiment Classification Model Using TF-IDF and Next Word Negation , 2018, ArXiv.

[23]  Amir Hussain,et al.  SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis , 2018, BICS.

[24]  Samhaa R. El-Beltagy,et al.  Building Large Arabic Multi-domain Resources for Sentiment Analysis , 2015, CICLing.

[25]  Muhammad Abdul-Mageed,et al.  AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis , 2012, LREC.

[26]  Ayoub Ait Lahcen,et al.  ASA: A framework for Arabic sentiment analysis , 2020, J. Inf. Sci..

[27]  Faiçal Azouaou,et al.  ASDA : Analyseur Syntaxique du Dialecte Alg{é}rien dans un but d'analyse s{é}mantique , 2017, ArXiv.

[28]  Mahmoud Al-Ayyoub,et al.  Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels' reviews , 2017, J. Comput. Sci..

[29]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[30]  Mahieddine Djoudi,et al.  SIAAC: Sentiment Polarity Identification on Arabic Algerian Newspaper Comments , 2017 .

[31]  Fethi Bougares,et al.  Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments , 2017, WANLP@EACL.

[32]  Salwani Abdullah,et al.  Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis , 2018, J. Inf. Sci..

[33]  Simon Dobnik,et al.  Can Modern Standard Arabic Approaches be used for Arabic Dialects? Sentiment Analysis as a Case Study , 2019 .

[34]  Nursal Arici,et al.  Sentiment Analysis of Iraqi Arabic Dialect on Facebook Based on Distributed Representations of Documents , 2019, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[35]  Karima Meftouh,et al.  Maghrebi Arabic dialect processing: an overview , 2017 .

[36]  M'hamed Mataoui,et al.  A Proposed Lexicon-Based Sentiment Analysis Approach for the Vernacular Algerian Arabic , 2016, Res. Comput. Sci..

[37]  Hazem M. Hajj,et al.  ArSentD-LEV: A Multi-Topic Corpus for Target-based Sentiment Analysis in Arabic Levantine Tweets , 2019, ArXiv.

[38]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[39]  David Sankoff,et al.  A Formal Grammar for Code-Switching. CENTRO Working Papers 8. , 1980 .

[40]  A. Shoukry,et al.  Preprocessing Egyptian Dialect Tweets for Sentiment Mining , 2012, AMTA.