SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis

Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construct an annotated sentiment corpus for Algerian dialect (A Maghrebi Arabic dialect). The construction of this corpus is based on an Algerian sentiment lexicon that is also constructed automatically. The presented work deals with the two widely used scripts on Arabic social media: Arabic and Arabizi. The proposed approach automatically constructs a sentiment corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi test sets, respectively. Ongoing work is aimed at integrating transliteration process for Arabizi messages to further improve the obtained results.

[1]  Jinchang Ren,et al.  Performance of hidden Markov model and dynamic Bayesian network classifiers on handwritten Arabic word recognition , 2011, Knowl. Based Syst..

[2]  Nizar Habash,et al.  A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining , 2014, ANLP@EMNLP.

[3]  Luis Alfonso Ureña López,et al.  OCA: Opinion corpus for Arabic , 2011, J. Assoc. Inf. Sci. Technol..

[4]  Jianmin Jiang,et al.  Component-based Segmentation of words from handwritten Arabic text , 2009 .

[5]  Imane GUELLIL,et al.  Lexicon for Algerian Arabic Dialect Treatment in Social Media , 2017 .

[6]  Karima Meftouh,et al.  Machine Translation Experiments on PADIC: A Parallel Arabic DIalect Corpus , 2015, PACLIC.

[7]  Faiçal Azouaou,et al.  ASDA : Analyseur Syntaxique du Dialecte Alg{é}rien dans un but d'analyse s{é}mantique , 2017, ArXiv.

[8]  M'hamed Mataoui,et al.  A Proposed Lexicon-Based Sentiment Analysis Approach for the Vernacular Algerian Arabic , 2016, Res. Comput. Sci..

[9]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[10]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[11]  Matthew England,et al.  Arabic language sentiment analysis on health services , 2017, 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR).

[12]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[13]  Faiçal Azouaou,et al.  Arabic Dialect Identification with an Unsupervised Learning (Based on a Lexicon). Application Case: ALGERIAN Dialect , 2016, 2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES).

[14]  Hend Suliman Al-Khalifa,et al.  AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets , 2017, ACLING.

[15]  Muhammad Abdul-Mageed,et al.  AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis , 2012, LREC.

[16]  Mahmoud Al-Ayyoub,et al.  Lexicon-based sentiment analysis of Arabic tweets , 2015, Int. J. Soc. Netw. Min..

[17]  Jianmin Jiang,et al.  Offline handwritten Arabic cursive text recognition using Hidden Markov Models and re-ranking , 2011, Pattern Recognit. Lett..

[18]  Imene Guellil,et al.  Social big data mining: A survey focused on opinion mining and sentiments analysis , 2015, 2015 12th International Symposium on Programming and Systems (ISPS).

[19]  Rehab M. Duwairi,et al.  Sentiment analysis for Arabizi text , 2016, 2016 7th International Conference on Information and Communication Systems (ICICS).

[20]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[21]  Amir F. Atiya,et al.  LABR: A Large Scale Arabic Book Reviews Dataset , 2013, ACL.

[22]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[23]  Mahmoud Al-Ayyoub,et al.  Automatic Lexicon Construction for Arabic Sentiment Analysis , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[24]  Fethi Bougares,et al.  Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments , 2017, WANLP@EACL.