Memory-Based Morphological Analysis Generation and Part-of-Speech Tagging of Arabic

We explore the application of memory-based learning to morphological analysis and part-of-speech tagging of written Arabic, based on data from the Arabic Treebank. Morphological analysis -- the construction of all possible analyses of isolated unvoweled wordforms -- is performed as a letter-by-letter operation prediction task, where the operation encodes segmentation, part-of-speech, character changes, and vocalization. Part-of-speech tagging is carried out by a bi-modular tagger that has a subtagger for known words and one for unknown words. We report on the performance of the morphological analyzer and part-of-speech tagger. We observe that the tagger, which has an accuracy of 91.9% on new data, can be used to select the appropriate morphological analysis of words in context at a precision of 64.0 and a recall of 89.7.

[1]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[2]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[3]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[4]  Walter Daelemans,et al.  Memory-Based Morphological Analysis , 1999, ACL.

[5]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[6]  S. Khoja,et al.  APT: Arabic Part-of-speech Tagger , 2001 .

[7]  Hadj Ahmed Cherkaoui A Computational Lexeme-Based Treatment of Arabic Morphology , 2001 .

[8]  Andrew Freeman,et al.  Brill's POS tagger and a Morphology parser for Arabic , 2001, ACL 2001.

[9]  George Anton Kiraz Multi-Tape Two-Level Morphology: A Case Study in Semitic Non-linear Morphology , 1994, COLING.

[10]  Martin Kay,et al.  Nonconcatenative Finite-State Morphology , 1987, EACL.

[11]  Kenneth R. Beesley Consonant Spreading in Arabic Stems , 1998, COLING-ACL.

[12]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[13]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[14]  Teruko Mitamura,et al.  Arabic Morphology Generation Using a Concatenative Strategy , 2000, ANLP.

[15]  Walter Daelemans,et al.  MBT : Memory Based Tagger, version 1.0, Reference Guide , 2002 .

[16]  Dan Roth,et al.  Learning Hebrew Roots: Machine Learning with Linguistic Constraints , 2004, EMNLP.