论文信息 - Impact of Stemmer on Arabic Text Retrieval

Impact of Stemmer on Arabic Text Retrieval

Stemming is a process of reducing inflected words to their stem, stem or root from a generally written word form. One of the high inflected words in the languages world is Arabic Language. Stemming improve the retrieval performance by reducing words variants, and in lcrease the similarity between related words. However, an Arabic Information Retrieval (AIR) can use stemming algorithms to retrieve a greater number of documents related to the users’ query. Therefore, the aim of this paper is to evaluate the impact of three different Arabic stemmers (i.e. ‘Information Science Research Institute” (ISRI), morphological and syntax based lemmatization “Educated Text Stemmer” (ETS), and Light10 stemmer) on the Arabic Information Retrieval performance for Arabic language, we used the Linguistic Data Consortium (LDC) Arabic Newswire data set as benchmark dataset. The evaluation of the three different stemmers ranked the best performance was achieved by light10 stemmer in term of mean average precision.

[1] Kazem Taghva,et al. Arabic stemming without a root dictionary , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[2] Félix de Moya Anegón,et al. Term conflation methods in information retrieval: Non‐linguistic and linguistic approaches , 2005 .

[3] Ahmed A. Rafea,et al. An accuracy-enhanced light stemmer for arabic text , 2011, TSLP.

[4] Berkant Barla Cambazoglu,et al. Review of "Search Engines: Information Retrieval in Practice" by Croft, Metzler and Strohman , 2010, Inf. Process. Manag..

[5] S. Khoja,et al. APT: Arabic Part-of-speech Tagger , 2001 .

[6] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[7] Lisa Ballesteros,et al. Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[8] Ibrahim A. Al-Kharashi,et al. Arabic morphological analysis techniques: A comprehensive survey , 2004, J. Assoc. Inf. Sci. Technol..

[9] Fredric C. Gey,et al. Evaluating Arabic Retrieval from English or French Queries: The TREC-2001 Cross-Language Information Retrieval Track , 2001 .

[10] Leah S. Larkey,et al. Arabic Information Retrieval at UMass in TREC-10 , 2001, TREC.

[11] Masnizah Mohd,et al. Enhanced Arabic Information Retrieval: Light Stemming and Stop Words , 2013, M-CAIT.

[12] Amine Bensaid,et al. Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm , 2004 .

[13] Nazlia Omar,et al. Arabic machine translation: a survey , 2012, Artificial Intelligence Review.