MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)

We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25% with 1<=N<=3 but showed a much better result with character 4-grams (62.17% accuracy).

[1]  Nizar Habash,et al.  The MADAR Shared Task on Arabic Fine-Grained Dialect Identification , 2019, WANLP@ACL 2019.

[2]  James R. Glass,et al.  Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Abualsoud Hanani,et al.  Birzeit Arabic Dialect Identification System for the 2018 VarDial Challenge , 2018, VarDial@COLING 2018.

[4]  Muhammad Abdul-Mageed,et al.  Deep Models for Arabic Dialect Identification on Benchmarked Data , 2018, VarDial@COLING 2018.

[5]  Houda Bouamor,et al.  Fine-Grained Arabic Dialect Identification , 2018, COLING.

[6]  Kemal Oflazer,et al.  The MADAR Arabic Dialect Corpus and Lexicon , 2018, LREC.

[7]  Hend Suliman Al-Khalifa,et al.  AraSenTi: Large-Scale Twitter-Specific Arabic Sentiment Lexicons , 2016, ACL.

[8]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[9]  Mona T. Diab,et al.  Sentence Level Dialect Identification in Arabic , 2013, ACL.

[10]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[11]  M. Amara Reem Bassiouney: Arabic Sociolinguistics , 2010 .

[12]  Wingyan Chung,et al.  Web searching in a multilingual world , 2008, CACM.

[13]  W. Miller :Sacred Language, Ordinary People: Dilemmas of Culture and Politics in Egypt , 2007 .

[14]  Niloofar Haeri,et al.  Sacred Language, Ordinary People: Dilemmas of Culture and Politics in Egypt , 2003 .

[15]  Davide Buscaldi,et al.  Modèles en Caractères pour la Détection de Polarité dans les Tweets (Character-level Models for Polarity Detection in Tweets ) , 2018, JEPTALNRECITAL.

[16]  Mervat Ibrahim The Arabic Language , 2012 .

[17]  Mohand Tilmatine,et al.  Substrat et convergences: le berbère et l¿arabe nord-africain , 1999 .

[18]  A. Kaye,et al.  Pronouncing Arabic, Vol. 2 , 1996 .