An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modeling Finite State Networks

Morphological ambiguity is a major concern for syntactic parsers, POS taggers and other NLP tools. For example, the greater the number of morphological analyses given for a lexical entry, the longer a parser takes in analyzing a sentence, and the greater the number of parses it produces. Xerox Arabic Finite State Morphology and Buckwalter Arabic Morphological Analyzer are two of the best known, well documented, morphological analyzers for Modern Standard Arabic (MSA). Yet there are significant problems with both systems in design as well as coverage that increase the ambiguity rate. This paper shows how an ambiguity-controlled morphological analyzer for Arabic is built in a rule-based system that takes the stem as the base form using finite state technology. The paper also points out sources of legal and illegal ambiguities in MSA, and how ambiguity in the new system is reduced without compromising precision. At the end, an evaluation of Xerox, Buckwalter, and our system is conducted, and the performance is compared and analyzed.

[1]  Tim Buckwalter Issues in Arabic Orthography and Morphology Analysis , 2004 .

[2]  J. M. Cowan,et al.  A dictionary of modern written Arabic , 1963 .

[3]  John J. McCarthy,et al.  Formal Problems in Semitic Phonology and Morphology , 2018 .

[4]  Pr. Mohamed Hassoun On lemmatization in Arabic , A formal definition of the Arabic entries of multilingual lexical databases , 2001 .

[5]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[6]  George Anton Kiraz,et al.  Arabic Computational Morphology in the West , 1998 .

[7]  Kenneth R. Beesley,et al.  Arabic Morphology Using Only Finite-State Operations , 1998, SEMITIC@COLING.

[8]  Andrew Freeman,et al.  Brill's POS tagger and a Morphology parser for Arabic , 2001, ACL 2001.

[9]  Dror Kamir,et al.  A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew , 2002, SEMITIC@ACL.

[10]  Robert Ratcliffe The “Broken” Plural Problem in Arabic and Comparative Semitic: Allomorphy and analogy in non-concatenative morphology , 1998 .

[11]  Jan Hajiÿc,et al.  Feature-Based Tagger of Approximations of Functional Arabic Morphology , 2005 .

[12]  Shuly Wintner,et al.  A Finite-State Morphological Grammar of Hebrew , 2005, Natural Language Engineering.

[13]  K. R. Beesley Arabic Morphological Analysis on the Internet , 2007 .

[14]  Joseph Dichy,et al.  The Architecture of a Standard Arabic Lexical Database. Some Figures, Ratios and Categories from the DIINAR.1 Source Program , 2004 .

[15]  Ali Farghaly,et al.  Roots & patterns vs. stems plus grammar-lexis specifications: on what basis should a multilingual database centred on Arabic be built? , 2003, MTSUMMIT.

[16]  Mohammed A. Attia Accommodating Multiword Expressions in an Arabic LFG Grammar , 2006, FinTAL.

[17]  Kenneth R. Beesley Arabic Finite-State Morphological Analysis and Generation , 1996, COLING.

[18]  Kenneth R. Beesley,et al.  Finite-State Morphological Analysis and Generation of Arabic at Xerox Research: Status and Plans in 2001 , 2001 .

[19]  J. V. Rauff,et al.  Finite State Morphology , 2007 .