Special issue on Machine Translation for Arabic: Preface

During the last decade, much research has been conducted in Machine Translation (MT) where the Arabic language was the focus. Arabic is the/an official language of 25 countries. The script it uses, and which holds its namesake, is used for writing many languages around the world which are not related to Arabic such as Persian, Kurdish, Urdu and Pashto. Some of Arabic’s closest Semitic sisters, Hebrew, Syriac and Maltese, are written in other scripts. Arabic brings many challenges to computational processing in general and to MT in particular. Arabic orthography uses optional diacritical marks, mostly for vowelization but also consonantal doubling. The orthography also allows cliticization of single letter conjunctions and prepositions in addition to possessive/accusative pronouns. These two features lead to high degrees of ambiguity and sparsity. Arabic morphology uses a mix of concatenative and templatic morphemes and has a large number of inflectional features. Syntactically, Arabic has both subject–verb and verb–subject orders, which is particularly challenging to parsing and MT since there is a need to perform long distance movements when translating to English. Finally, the Arabic language is a collection of multiple variants among which one particular variant has a special status as the formal written standard of the media, culture and education across the Arab World: Modern Standard Arabic (MSA). The other variants are informal spoken dialects that are the media of communication for daily life. In this special issue, we focus on challenges and solutions for translation to and from Arabic. The bulk of the articles in this special issue falls either into morphology-based or syntax-based solutions.