Arabic Light Stemmer : Anew Enhanced Approach

In general, word stemming is one of the most important factors that affect the performance of information retrieval systems. The optimization issues of Arabic light stemming algorithm as a main component in natural language processing and information retrieval for Arabic language are based on root-pattern schemes. Since Arabic language is a highly inflected language and has a complex morphological structure than English, it requires superior stemming algorithms for effective information retrieval. This paper reports on the enhancement of a TREC-2002 Arabic light stemmer presented by Kareem Darwish, University of Maryland. Five stemming algorithms are proposed that result in significantly better Arabic stemming outcomes in comparison with the TREC-2002 algorithm.

[1]  Andrew Spencer,et al.  Morphological Theory: An Introduction to Word Structure in Generative Grammar , 1991 .

[2]  Fredric C. Gey,et al.  Building an Arabic Stemmer for Information Retrieval , 2002, TREC.

[3]  David Crystal,et al.  A dictionary of linguistics and phonetics , 1997 .

[4]  Yiming Yang,et al.  Unsupervised Learning of Arabic Stemming Using a Parallel Corpus , 2003, ACL.

[5]  Chris D. Paice Method for Evaluation of Stemming Algorithms Based on Error Counting , 1996, J. Am. Soc. Inf. Sci..

[6]  Ali Farghaly,et al.  Roots & patterns vs. stems plus grammar-lexis specifications: on what basis should a multilingual database centred on Arabic be built? , 2003, MTSUMMIT.

[7]  Jonathan Owens,et al.  The Foundations of Grammar: An Introduction to Medieval Arabic Grammatical Theory , 1989 .

[8]  Douglas W. Oard,et al.  CLIR Experiments at Maryland for TREC 2002: Evidence Combination for Arabic-English Retrieval , 2002, TREC.

[9]  Ophir Frieder,et al.  On arabic search: improving the retrieval effectiveness via a light stemming approach , 2002, CIKM '02.

[10]  Martha W. Evens,et al.  Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System , 1999, J. Am. Soc. Inf. Sci..

[11]  Alexander M. Fraser,et al.  TREC 2001 Cross-lingual Retrieval at BBN , 2001, TREC.

[12]  S. Khoja,et al.  APT: Arabic Part-of-speech Tagger , 2001 .

[13]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[14]  Pierre A. MacKay,et al.  Computers and the Arabic language , 1990 .

[15]  Ibrahim A. Al-Kharashi,et al.  Arabic morphological analysis techniques: A comprehensive survey , 2004, J. Assoc. Inf. Sci. Technol..

[16]  Alexander M. Fraser,et al.  Empirical studies in strategies for Arabic retrieval , 2002, SIGIR '02.