Elissa: A Dialectal to Standard Arabic Machine Translation System

Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present Elissa , a machine translation (MT) system from DA to MSA. Elissa (version 1.0) employs a rule-based approach that relies on morphological analysis, morphological transfer rules and dictionaries in addition to language models to produce MSA paraphrases of dialectal sentences. Elissa can be employed as a general preprocessor for dialectal Arabic when using MSA NLP tools. u j’ ®E @ e J K. QaE@ u I@ e J K. QaE@ H A J OA aE @ aO e J E B@ e Og. Q E E u G. nƒA g ÐA ¢  : A ‚ E @ eE KA U O XP@ nOð H@ ðX @ Q n J K B A O J K. A J K. nƒA g u j’ ®E @ e J K. QaE@ e a EE @ e m.I'A aOI eQ J» XP@ n Oð H@ ð X @ Yg. n K @ Ye A J Jm '. u ÐY ® Jƒ . e J K. QaE@ e a EE @ aO e J ƒA J ®E @ Q « e J ojOI @ q‚ E @ u eð , e J K. QaE@ H A J OA aE @ em.I'A a UI A ‚ E @ YO Ja K . u j’ ®E @ e J K. QaE@ u I@ e J K. QaE@ H A J OA aE @ aO e J E B@ e Og. Q E AK. Ðn ®K u G. nƒA g ÐA ¢  u eð , A ‚ E @ Nk. A aOð e Og. Q E @ Y «@ n  aO e «nOm.×ð e OE3⁄4E E u Q ’E@ EJ Ej JE @ eJ ÐY j J ‚ , Y «@ n ®E @ u I« A J J.O C k u j’ ®E @ e E Om.I'@ P A J J kB e K n a E h. X A U s u I@ e A “@ , e J OA aE @ H A OE3⁄4E E H A Og. Q Kð H A X @ QO Z A ‚ B e J OA « Ð @ Y j Jƒ@ EJ.  e J K. QaE@ H A J OA aE @ em.I'A a UI A ‚ E @ Ð@ Y j Jƒ@ aoOs . e JoO UI @ E O m.I'@ © J Og. a K. eC £ E ’ B@ . A iD E« u j’ ®E @ e J K. QaE@ e a EE e Y a O H @ ð X @

[1]  H. Sawaf Arabic Dialect Handling in Hybrid Machine Translation , 2010, AMTA.

[2]  Shankar Kumar,et al.  Improving Word Alignment with Bridge Languages , 2007, EMNLP.

[3]  Günter Neumann,et al.  Arabic Computational Morphology: Knowledge-based and Empirical Methods , 2007 .

[4]  Jeff A. Bilmes,et al.  Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Xiaoheng Zhang,et al.  Dialect MT: A Case Study between Cantonese and Mandarin , 1998, COLING-ACL.

[6]  Khaled Shaalan,et al.  A Hybrid Approach for Converting Written Egyptian Colloquial Dialect into Diacritized Arabic , 2008 .

[7]  Jan Hajic,et al.  Machine Translation of Very Close Languages , 2000, ANLP.

[8]  Nizar Habash,et al.  Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation , 2011, EMNLP 2011.

[9]  Nizar Habash,et al.  Arabic Morphological Representations for Machine Translation , 2007 .

[10]  Nizar Habash,et al.  Parsing Arabic Dialects , 2006, EACL.

[11]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[12]  Kevin Duh,et al.  POS Tagging of Dialectal Arabic: A Minimally Supervised Approach , 2005, SEMITIC@ACL.

[13]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[14]  Mona T. Diab,et al.  Token Level Identification of Linguistic Code Switching , 2012, COLING.

[15]  Chris Callison-Burch,et al.  Machine Translation of Arabic Dialects , 2012, NAACL.

[16]  Nizar Habash,et al.  50th Annual Meeting of the Association for Computational Linguistics Proceedings of the Conference Volume 2: Short Papers , 2012 .

[17]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[18]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[19]  Nizar Habash,et al.  MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects , 2006, ACL.