An Open-Source Finite State Morphological Transducer for Modern Standard Arabic

We develop an open-source large-scale finitestate morphological processing toolkit (AraComLex) for Modern Standard Arabic (MSA) distributed under the GPLv3 license. The morphological transducer is based on a lexical database specifically constructed for this purpose. In contrast to previous resources, the database is tuned to MSA, eliminating lexical entries no longer attested in contemporary use. The database is built using a corpus of 1,089,111,204 words, a pre-annotation tool, machine learning techniques, and knowledge-based pattern matching to automatically acquire lexical knowledge. Our morphological transducer is evaluated and compared to LDC's SAMA (Standard Arabic Morphological Analyser).

[1]  Alaa Elgibali,et al.  Understanding Arabic: Essays in Contemporary Linguistics in Honor of El-Said Badawi , 1998 .

[2]  Nizar Habash,et al.  Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking , 2008, ACL.

[3]  Kenneth R. Beesley,et al.  Finite-State Morphological Analysis and Generation of Arabic at Xerox Research: Status and Plans in 2001 , 2001 .

[4]  J. McCarthy The phonology and morphology of Arabic , 2004 .

[5]  B. Boguraev Book Reviews: Looking Up: An Account of the COBUILD PROJECT IN LEXICAL COMPUTING , 1990, CL.

[6]  Mohammed A. Attia An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modeling Finite State Networks , 2006, BCS.

[7]  Jeffrey Heath,et al.  Understanding Arabic: Essays in Contemporary Arabic Linguistics in Honor of El-Said Badawi , 1996 .

[8]  Markus Walther Computational nonlinear morphology with emphasis on semitic languages , 2002, Computational Linguistics.

[9]  Ali Farghaly,et al.  Roots & patterns vs. stems plus grammar-lexis specifications: on what basis should a multilingual database centred on Arabic be built? , 2003, MTSUMMIT.

[10]  B. T. S. Atkins,et al.  The Oxford Guide to Practical Lexicography , 2008 .

[11]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[12]  Mans Hulden,et al.  Foma: a Finite-State Compiler and Library , 2009, EACL.

[13]  K. R. Beesley Arabic Morphological Analysis on the Internet , 2007 .

[14]  Musaed S. Bin-Muqbil PHONETIC AND PHONOLOGICAL ASPECTS OF ARABIC EMPHATICS AND GUTTURALS , 2006 .

[15]  J. M. Cowan,et al.  A dictionary of modern written Arabic , 1963 .

[16]  Jaroslav Stetkevych,et al.  The Modern Arabic Literary Language: Lexical and Stylistic Developments , 1970 .

[17]  Mark van Mol,et al.  Variation in Modern Standard Arabic in Radio News Broadcasts A Synchronic Descriptive Investigation into the Use of Complementary Particles , 2003 .