Issues in Arabic Morphological Analysis
暂无分享,去创建一个
The salient issues facing contemporary Arabic morphological analysis are summarized as predominantly orthographic in nature, although the issue of how to integrate morphological analysis of the dialects into the existing morphological analysis of Modern Standard Arabic is identified as the primary challenge of the next decade. Issues of orthography that impact morphological analysis stem in part from the successful deployment of the Unicode standard and the subsequent increase in usage of the expanded Arabic character set, including what are properly Persian and Urdu characters. Additional orthographic issues impacting morphological analysis arise from the persistent and widespread variation in the spelling of letters such as hamza and tā’ marbūTa, and the increasing lack of differentiation between word-final yā’ and alif maqSūra. The tokenization of Arabic input strings is also affected by orthography, as typists often neglect to insert a space after words that end with a non-connector letter. An increasing number of archaic morphological features and dated lexical items can be observed in Web-based Islamic publications and cannot be overlooked in contemporary analysis. Finally, the accuracy and completeness of current Arabic morphological analysis can be questioned in light of the almost complete absence of annotation for lexically-determined features of gender, number, and humanness
[1] Mark Davis,et al. The Unicode Standard, Version 3.0 , 2000 .
[2] Kenneth R. Beesley,et al. Finite-State Morphological Analysis and Generation of Arabic at Xerox Research: Status and Plans in 2001 , 2001 .
[3] Michael G. Carter,et al. Modern Written Arabic: A Comprehensive Grammar , 2003 .
[4] Tim Buckwalter. Issues in Arabic Orthography and Morphology Analysis , 2004 .
[5] Otakar Smr,et al. Formal System and Implementation , 2006 .