Improving Arabic Part-of-Speech Tagging through Morphological Analysis

This paper describes our newly-developed second order hidden Markov model part-of-speech tagging system specially designed to tag Arabic texts using small training data. The tagger achieves encouraging results. In addition, the paper also presents a hybrid tagging architecture for Arabic, in which our tagger augmented with a weighted morphological analyzer. Finally, we compare the tagger results both standalone and utilizing a highly coverage morphological analyzer. Experimental results are presented and discussed using small training corpus. The experiments show that the best proposed hybrid architecture significantly improves unknown words POS tagging accuracy. 96.6% precision rates are obtained when unknown words occur in the test set.

[1]  David J. Hand,et al.  Advances in Intelligent Data Analysis , 2000, Lecture Notes in Computer Science.

[2]  Atro Voutilainen,et al.  Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.

[3]  András Kornai,et al.  HunPos: an open source trigram tagger , 2007, ACL 2007.

[4]  Nazlia Omar,et al.  Arabic Part Of Speech Disambiguation: A Survey , 2009 .

[5]  Zeljko Agic,et al.  Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis , 2008, Informatica.

[6]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[7]  Mohd Zakree Ahmad Nazri,et al.  Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model , 2010, RSKT.

[8]  András Kornai,et al.  Poster paper: HunPos – an open source trigram tagger , 2007, ACL.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[11]  José Gabriel Pereira Lopes,et al.  Tagging with Small Training Corpora , 2001, IDA.

[12]  Nizar Habash Arabic Natural Language Processing , 2008 .

[13]  Sandra Kübler,et al.  Arabic Part of Speech Tagging , 2010, LREC.

[14]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[15]  Mary P. Harper,et al.  A Second-Order Hidden Markov Model for Part-of-Speech Tagging , 1999, ACL.

[16]  Tetsuji Nakagawa,et al.  Multilingual word segmentation and part-of-speech tagging : a machine learning approach incorporating diverse features , 2006 .

[17]  Ingo Schröder A Case Study in Part-of-Speech Tagging Using the ICOPOST Toolkit , 2002 .

[18]  Ahmed Guessoum,et al.  A Hidden Markov Model -Based POS Tagger for Arabic , 2006 .

[19]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[20]  William J. Black,et al.  Arabic part of speech tagging using Tranformation-Based Learning , 2009 .