AraNet: A Deep Learning Toolkit for Arabic Social Media

We describe AraNet, a collection of deep learning Arabic social media processing tools. Namely, we exploit an extensive host of publicly available and novel social media datasets to train bidirectional encoders from transformer models (BERT) to predict age, dialect, gender, emotion, irony, and sentiment. AraNet delivers state-of-the-art performance on a number of the cited tasks and competitively on others. In addition, AraNet has the advantage of being exclusively based on a deep learning framework and hence feature engineering free. To the best of our knowledge, AraNet is the first to performs predictions across such a wide range of tasks for Arabic NLP and thus meets a critical needs. We publicly release AraNet to accelerate research and facilitate comparisons across the different tasks.

[1]  Farah Benamara,et al.  SOUKHRIA: Towards an Irony Detection System for Arabic in Social Media , 2017, ACLING.

[2]  Muhammad Abdul-Mageed,et al.  Enabling Deep Learning of Emotion With First-Person Seed Expressions , 2018, PEOPLES@NAACL-HTL.

[3]  Abed Allah Khamaiseh,et al.  A comprehensive survey of arabic sentiment analysis , 2019, Inf. Process. Manag..

[4]  Hazem M. Hajj,et al.  Deep Learning Models for Sentiment Analysis in Arabic , 2015, ANLP@ACL.

[5]  Paolo Rosso,et al.  IDAT at FIRE2019: Overview of the Track on Irony Detection in Arabic Tweets , 2019, FIRE.

[6]  Amir F. Atiya,et al.  LABR: A Large Scale Arabic Book Reviews Dataset , 2013, ACL.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Salwani Abdullah,et al.  Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis , 2018, J. Inf. Sci..

[9]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[10]  Ryan Cotterell,et al.  A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic , 2014, LREC.

[11]  Walid Magdy,et al.  Mazajak: An Online Arabic Sentiment Analyser , 2019, WANLP@ACL 2019.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Kemal Oflazer,et al.  The MADAR Arabic Dialect Corpus and Lexicon , 2018, LREC.

[14]  Saif Mohammad,et al.  SemEval-2018 Task 1: Affect in Tweets , 2018, *SEMEVAL.

[15]  Muhammad Abdul-Mageed,et al.  Multi-Task Bidirectional Transformer Representations for Irony Detection , 2019, FIRE.

[16]  Muhammad Abdul-Mageed,et al.  ASMA: A System for Automatic Segmentation and Morpho-Syntactic Disambiguation of Modern Standard Arabic , 2013, RANLP.

[17]  Chris Callison-Burch,et al.  The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content , 2011, ACL.

[18]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[19]  Muhammad Abdul-Mageed,et al.  Understanding and Detecting Dangerous Speech in Social Media , 2020, OSACT.

[20]  Amir F. Atiya,et al.  ASTD: Arabic Sentiment Tweets Dataset , 2015, EMNLP.

[21]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[22]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[23]  Sandra Kübler,et al.  Arabic Part of Speech Tagging , 2010, LREC.

[24]  Nizar Habash,et al.  The MADAR Shared Task on Arabic Fine-Grained Dialect Identification , 2019, WANLP@ACL 2019.

[25]  Khaled Shaalan,et al.  Arabic Tweets Sentimental Analysis Using Machine Learning , 2017, IEA/AIE.

[26]  Samhaa R. El-Beltagy,et al.  Building Large Arabic Multi-domain Resources for Sentiment Analysis , 2015, CICLing.

[27]  Mona T. Diab,et al.  Sentence Level Dialect Identification in Arabic , 2013, ACL.

[28]  Muhammad Abdul-Mageed,et al.  Leveraging Affective Bidirectional Transformers for Offensive Language Detection , 2020, OSACT.

[29]  Muhammad Abdul-Mageed,et al.  SAMAR: Subjectivity and sentiment analysis for Arabic social media , 2014, Comput. Speech Lang..

[30]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[31]  Mahmoud Al-Ayyoub,et al.  Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews , 2018, International Journal of Machine Learning and Cybernetics.

[32]  Vili Podgorelec,et al.  Text classification method based on self-training and LDA topic models , 2017, Expert Syst. Appl..

[33]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[34]  Wajdi Zaghouani,et al.  Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification , 2018, LREC.

[35]  Muhammad Abdul-Mageed,et al.  "Yes we can?": Subjectivity Annotation and Tagging for the Health Domain , 2011, RANLP.

[36]  Hazem M. Hajj,et al.  ArSentD-LEV: A Multi-Topic Corpus for Target-based Sentiment Analysis in Arabic Levantine Tweets , 2019, ArXiv.

[37]  Nadir Durrani,et al.  Farasa: A Fast and Furious Segmenter for Arabic , 2016, NAACL.

[38]  Saif Mohammad,et al.  Sentiment after Translation: A Case-Study on Arabic Social Media Posts , 2015, NAACL.

[39]  Muhammad Abdul-Mageed,et al.  Modeling Arabic subjectivity and sentiment in lexical space , 2017, Inf. Process. Manag..

[40]  Saif Mohammad,et al.  SemEval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases , 2016, *SEMEVAL.

[41]  Paolo Rosso,et al.  Overview of the Track on Author Profiling and Deception Detection in Arabic , 2019, FIRE.

[42]  Muhammad Abdul-Mageed,et al.  No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects , 2019, WANLP@ACL 2019.