A rule-based approach for tagging non-vocalized Arabic words

In this work, we present a tagging system which classifies the words in a non-vocalized Arabic text to their tags. The proposed tagging system passes through three levels of analysis. The first level is a lexical analyzer that composed of a lexicon containing all fixed words and particles such as prepositions and pronouns. The second level is a morphological analyzer which relies on word structure using patterns and affixes to determine word class. The third level is a syntax analyzer or a grammatical tagging which relies on the process of assigning grammatical tags to words based on their context or the position of the word in the sentence. The syntax analyzer level consists of two stages: the first stage depends on specific keywords that inform the tag of the successive word, the second stage is the reversed parsing technique which scans the available grammars of Arabic language to get the class of a single ambiguity word in the sentence. We have tested the proposed system on a corpus consists of 2355 words. Experimental results showed that the proposed system achieved a rate of success approaching 94% of the total number of words in the sample used in the study.

[1]  Rafi Talmon Morphological Tagging of the Qur ’ an , .

[2]  Martha Evens,et al.  Discovering Lexical Information by Tagging Arabic Newspaper Text , 1998, SEMITIC@COLING.

[3]  Ahmed Guessoum,et al.  A Hidden Markov Model -Based POS Tagger for Arabic , 2006 .

[4]  Hani Safadi,et al.  Computational Methods to Vocalize Arabic Texts , 2006 .

[5]  Tunga Güngör,et al.  Part-of-Speech Tagging , 2005 .

[6]  Saleem Abuleil,et al.  Extracting Names From Arabic Text for Question-Answering Systems , 2004, RIAO.

[7]  Ossama Emam,et al.  Language Model Based Arabic Word Segmentation , 2003, ACL.

[8]  Martha Evens,et al.  Acquisition System for Arabic Noun Morphology , 2002, SEMITIC@ACL.

[9]  Andrew Freeman,et al.  Brill's POS tagger and a Morphology parser for Arabic , 2001, ACL 2001.

[10]  Riyad Al-Shalabi,et al.  Constructing An Automatic Lexicon for Arabic Language , 2005 .

[11]  Saleem Abuleil,et al.  Enhance the Process of Tagging and Classifying Proper Names in Arabic Text , 2006 .

[12]  Aladdin Ayesh,et al.  Word class tagger & tagset design for vocalized Arabic text , 2006 .

[13]  Atro Voutilainen Part-of-Speech Tagging , 2005 .

[14]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[15]  Mark Van Mol The Semi-automatic Tagging of Arabic Corpora , 2002 .

[16]  M. Maamouri,et al.  Resources for arabic natural language processing at the linguistic data consortium , 2005 .

[17]  Antal van den Bosch,et al.  Memory-Based Morphological Analysis Generation and Part-of-Speech Tagging of Arabic , 2005, SEMITIC@ACL.

[18]  S. Khoja,et al.  APT: Arabic Part-of-speech Tagger , 2001 .

[19]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[20]  Sari Awwad,et al.  Arabic Word Class Tagging Based on the Analysis of Affix Structure , 2006 .

[21]  Mohamed Ben Ahmed,et al.  A Multi-Agent System for POS-Tagging Vocalized Arabic Texts , 2007, Int. Arab J. Inf. Technol..