Issues in Arabic Morphological Analysis

The salient issues facing contemporary Arabic morphological analysis are summarized as predominantly orthographic in nature, although the issue of how to integrate morphological analysis of the dialects into the existing morphological analysis of Modern Standard Arabic is identified as the primary challenge of the next decade. Issues of orthography that impact morphological analysis stem in part from the successful deployment of the Unicode standard and the subsequent increase in usage of the expanded Arabic character set, including what are properly Persian and Urdu characters. Additional orthographic issues impacting morphological analysis arise from the persistent and widespread variation in the spelling of letters such as hamza and tā’ marbūTa, and the increasing lack of differentiation between word-final yā’ and alif maqSūra. The tokenization of Arabic input strings is also affected by orthography, as typists often neglect to insert a space after words that end with a non-connector letter. An increasing number of archaic morphological features and dated lexical items can be observed in Web-based Islamic publications and cannot be overlooked in contemporary analysis. Finally, the accuracy and completeness of current Arabic morphological analysis can be questioned in light of the almost complete absence of annotation for lexically-determined features of gender, number, and humanness