Arabic Language Processing: From Theory to Practice: 7th International Conference, ICALP 2019, Nancy, France, October 16–17, 2019, Proceedings

We present a methodology for creating a lexicon for a low-resource Arabic dialect in Saudi Arabia: Hijazi. We show the differences between the Hijazi dialect and Modern Standard Arabic. We annotate articles and tweets using recruited native speakers. We create a lexicon of Hijazi adapted from two resources: Sebawai and Quranic Arabic Corpus. The lexicon is created both manually and automatically by using Hijazi morphology. We detail the methodology to build this lexicon and present results of an evaluation of the corpus formation process.

[1]  Geoffrey Leech,et al.  Meaning and the English Verb , 1971 .

[2]  Laurence R. Horn Metalinguistic Negation and Pragmatic Ambiguity , 1985 .

[3]  Hatem Haddad,et al.  Tunisian Dialect Sentiment Analysis: A Natural Language Processing-based Approach , 2018, Computación y Sistemas.

[4]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[5]  Nizar Habash,et al.  An Arabic Morphological Analyzer and Generator with Copious Features , 2018 .

[6]  Gérard Deléchelle L'expression de la cause en anglais contemporain : étude de quelques connecteurs et opérations , 1989 .

[7]  Abdelmajid Ben Hamadou,et al.  Exploiting Emoticons to Generate Emotional Dictionaries from Facebook Pages , 2016 .

[8]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Manish Shrivastava,et al.  Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text , 2016, COLING.

[10]  Shana Poplack,et al.  Code Switching: Linguistic , 2001 .

[11]  MOHAMED-HABIB KAHLAOUI,et al.  A Framework for the Description and Analysis of Modality in Standard Arabic , 2015 .

[12]  Younes Zhiri,et al.  The Translation of Tense and Aspect from English into Arabic by Moroccan Undergraduates: Difficulties and Solutions , 2014 .

[13]  O. Jespersen Negation in English and other languages , 1917 .

[14]  Amir F. Atiya,et al.  ASTD: Arabic Sentiment Tweets Dataset , 2015, EMNLP.

[15]  É. Benveniste Problèmes de linguistique générale , 1968 .

[16]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[17]  M'hamed Mataoui,et al.  A Proposed Lexicon-Based Sentiment Analysis Approach for the Vernacular Algerian Arabic , 2016, Res. Comput. Sci..

[18]  C.H.M. Versteegh,et al.  Arabic linguistic tradition , 1997 .

[19]  Guodong Zhou,et al.  Emotion Detection in Code-switching Texts via Bilingual and Sentimental Information , 2015, ACL.

[20]  U. Shlonsky Clause Structure and Word Order in Hebrew and Arabic: An Essay in Comparative Semitic Syntax , 1997 .

[21]  Nikos Pelekis,et al.  DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis , 2017, *SEMEVAL.

[22]  Miguel A. Alonso,et al.  Sentiment Analysis on Monolingual, Multilingual and Code-Switching Twitter Corpora , 2015, WASSA@EMNLP.

[23]  Fabienne Toupin La « Philosophie spontanée d'un savant ». Henri Adamczewski (12 janvier 1929 – 25 décembre 2005) , 2015 .

[24]  Abdelkader Fassi Fehri,et al.  Issues in the Structure of Arabic Clauses and Words , 1993 .

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  Laurence R. Horn A Natural History of Negation , 1989 .

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Nizar Habash,et al.  A Characterization Study of Arabic Twitter Data with a Benchmarking for State-of-the-Art Opinion Mining Models , 2017, WANLP@EACL.

[29]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[30]  Fethi Bougares,et al.  Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments , 2017, WANLP@EACL.

[31]  Joachim Wagner,et al.  Code Mixing: A Challenge for Language Identification in the Language of Social Media , 2014, CodeSwitch@EMNLP.

[32]  Yue Zhang,et al.  A Bilingual Attention Network for Code-switched Emotion Prediction , 2016, COLING.