Machine translation for Arabic dialects (survey)

Arabic dialects also called colloquial Arabic or vernaculars are spoken varieties of Standard Arabic. These dialects have mixed form with many variations due to the influence of ancient local tongues and other languages like European ones. Many of these dialects are mutually incomprehensible. Arabic dialects were not written until recently and were used only in a speech form. Nowadays, with the advent of the internet and mobile telephony technologies, these dialects are increasingly used in a written form. Indeed, this kind of communication brought everyday conversations to a written format. This allows Arab people to use their dialects, which are their actual native languages for expressing their opinion on social media, for chatting, texting, etc. This growing use opens new research direction for Arabic natural language processing (NLP). We focus, in this paper, on machine translation in the context of Arabic dialects. We provide a survey of recent research in this area. We report for each study a detailed description of the adopted approach and we give its most relevant contribution.

[1]  H. Sawaf Arabic Dialect Handling in Hybrid Machine Translation , 2010, AMTA.

[2]  Khaled Shaalan Nizar Y. Habash, Introduction to Arabic natural language processing (Synthesis lectures on human language technologies) , 2011, Machine Translation.

[3]  Nizar Habash,et al.  The Effects of Factorizing Root and Pattern Mapping in Bidirectional Tunisian - Standard Arabic Machine Translation , 2013, MTSUMMIT.

[4]  Xiaoheng Zhang,et al.  Dialect MT: A Case Study between Cantonese and Mandarin , 1998, COLING-ACL.

[5]  Christoph Tillmann,et al.  A Simple Sentence-Level Extraction Algorithm for Comparable Data , 2009, NAACL.

[6]  Mikko Kurimo,et al.  Morfessor and variKN machine learning tools for speech and language technology , 2007, INTERSPEECH.

[7]  Elena Marchiori Practical Methods for Proving Termination of General Logic Programs , 1996, J. Artif. Intell. Res..

[8]  Mona T. Diab,et al.  Handling OOV Words in Dialectal Arabic to English Machine Translation , 2014, EMNLP 2014.

[9]  Nizar Habash,et al.  Elissa: A Dialectal to Standard Arabic Machine Translation System , 2012, COLING.

[10]  Stephan Vogel,et al.  Extracting Parallel Phrases from Comparable Data , 2011, BUCC@ACL.

[11]  A. BOUDLAL,et al.  A Morphosyntactic analysis system for Arabic texts , 2010 .

[12]  Stefan Riezler,et al.  Twitter Translation using Translation-Based Cross-Lingual Retrieval , 2012, WMT@NAACL-HLT.

[13]  Christof Monz,et al.  A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation , 2016, NUT@COLING.

[14]  G. H. Al-Gaphari,et al.  A Method to Convert Sana’ani Accent to Modern Standard Arabic , 2012 .

[15]  Mona T. Diab,et al.  Sentence Level Dialect Identification in Arabic , 2013, ACL.

[16]  Kemal Oflazer,et al.  Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic , 2014, ANLP@EMNLP.

[17]  Hwee Tou Ng,et al.  Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages , 2012, J. Artif. Intell. Res..

[18]  I. Çiçekli,et al.  1 A Machine Translation System Between a Pair of Closely Related Languages , 2002 .

[19]  Dragos Stefan Munteanu,et al.  Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora , 2006, ACL.

[20]  Ying Zhang,et al.  Optimizing components for handheld two-way speech translation for an English-iraqi Arabic system , 2006, INTERSPEECH.

[21]  Bowen Zhou,et al.  IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-Speech Translator , 2006 .

[22]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[23]  Khaled Shaalan,et al.  A Hybrid Approach for Converting Written Egyptian Colloquial Dialect into Diacritized Arabic , 2008 .

[24]  Marwan Awad,et al.  Evaluation of Machine Translation Errors in English and Iraqi Arabic , 2010, LREC.

[25]  Rahma Sellami,et al.  Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Application - the case of Tunisian Arabic and the Social Media , 2014, LG-LP@COLING.

[26]  Jan Hajic,et al.  Machine Translation of Very Close Languages , 2000, ANLP.

[27]  Nizar Habash,et al.  Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic , 2013, NAACL.

[28]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[29]  Kemal Oflazer,et al.  Transforming Standard Arabic to Colloquial Arabic , 2012, ACL.

[30]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[31]  Nizar Habash,et al.  Sentence Level Dialect Identification for Machine Translation System Selection , 2014, ACL.

[32]  Abdulhadi Shoufan,et al.  Natural Language Processing for Dialectical Arabic: A Survey , 2015, ANLP@ACL.

[33]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[34]  Preslav Nakov,et al.  Unsupervised Word Segmentation Improves Dialectal Arabic to English Machine Translation , 2014, ANLP@EMNLP.

[35]  Karim Bouzoubaa,et al.  A hybrid approach to translate Moroccan Arabic dialect , 2014, 2014 9th International Conference on Intelligent Systems: Theories and Applications (SITA-14).

[36]  Chris Callison-Burch,et al.  Machine Translation of Arabic Dialects , 2012, NAACL.

[37]  Kevin P. Scannell Machine translation for closely related language pairs , 2022 .

[38]  Gregory A. Sanders,et al.  Applying Automated Metrics to Speech Translation Dialogs , 2008, LREC.

[39]  Mark Hasegawa-Johnson,et al.  Development of a TV Broadcasts Speech Recognition System for Qatari Arabic , 2014, LREC.

[40]  Barry Haddow,et al.  Corpus development for machine translation between standard and dialectal varieties , 2013 .

[41]  Karima Meftouh,et al.  Machine Translation Experiments on PADIC: A Parallel Arabic DIalect Corpus , 2015, PACLIC.

[42]  Jeremy Jancsary,et al.  Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties , 2011 .

[43]  Holger Schwenk,et al.  Optimising Multiple Metrics with MERT , 2011, Prague Bull. Math. Linguistics.

[44]  Nizar Habash,et al.  Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation , 2011, EMNLP 2011.

[45]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[46]  Nizar Habash,et al.  MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects , 2006, ACL.

[47]  Jonathan May An Arabizi-English social media statistical machine translation system , 2014, AMTA.

[48]  Nadir Durrani,et al.  Improving Egyptian-to-English SMT by Mapping Egyptian into MSA , 2014, CICLing.

[49]  Mona T. Diab,et al.  AIDA: Identifying Code Switching in Informal Arabic Text , 2014, CodeSwitch@EMNLP.

[50]  Mauro Cettolo,et al.  Mining parallel fragments from comparable texts , 2010, IWSLT.

[51]  Hwee Tou Ng,et al.  Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages , 2009, EMNLP.

[52]  Yonatan Belinkov,et al.  Translating Dialectal Arabic to English , 2013, ACL.