An Accuracy-Enhanced Stemming Algorithm for Arabic Information Retrieval

This paper provides a method for indexing and retrieving Arabic texts, based on natural language processing. Our approach exploits the notion of template in word stemming and replaces the words by their stems. This technique has proven to be effective since it has returned significant relevant retrieval results by decreasing silence during the retrieval phase. Series of experiments have been conducted to test the performance of the proposed algorithm ESAIR (Enhanced Stemmer for Arabic Information Retrieval). The results obtained indicate that the algorithm extracts the exact root with an accuracy rate up to 96% and hence, improving information retrieval.

[1]  Kareem Darwish,et al.  Stemming techniques of Arabic Language: Comparative Study from the Information Retrieval Perspective , 2009 .

[2]  Achraf Chalabi MT-Based Transparent Arabization of the Internet TARJIM.COM , 2000, AMTA.

[3]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[4]  Mohammed A. Attia An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modeling Finite State Networks , 2006, BCS.

[5]  Andreas Nürnberger,et al.  Arabic/English word translation disambiguation approach based on naive Bayesian classifier , 2008, IMCSIT.

[6]  Ophir Frieder,et al.  On arabic search: improving the retrieval effectiveness via a light stemming approach , 2002, CIKM '02.

[7]  Khaled Shaalan,et al.  Query Expansion Based-on Similarity of Terms for Improving Arabic Information Retrieval , 2012, Intelligent Information Processing.

[8]  A. Nurnberger,et al.  Arabic/english word translation disambiguation approach based on naive bayesian classifier , 2008, 2008 International Multiconference on Computer Science and Information Technology.

[9]  Fredric C. Gey,et al.  Building an Arabic Stemmer for Information Retrieval , 2002, TREC.

[10]  Allaoua Refoufi,et al.  Un systeme de lemmatisation pour les applications de TALN , 2019, ArXiv.

[11]  Suzan Verberne,et al.  Phrase-Based Document Categorization , 2011, Current Challenges in Patent Information Retrieval.

[12]  Mustapha Chérif-Eddine Yagoub,et al.  A novel approach for indexing Arabic documents through GPU computing , 2012, 2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[13]  I. M. Alsmadi,et al.  Enhancing query retrieval efficiency using BGIT coding , 2012, 2012 International Conference on Computer, Information and Telecommunication Systems (CITS).

[14]  M. Otair,et al.  An Arabic Retrieval System with Native Language rather than SQL Queries , 2008, 2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT).

[15]  Sameh H. Ghwanmeh,et al.  Enhanced Algorithm for Extracting the Root of Arabic Words , 2009, 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization.

[16]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[17]  Guy Lapalme,et al.  Lakhas, an Arabic summarization system , 2004 .

[18]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[19]  Nizar Habash,et al.  Morphological Analysis and Generation for Arabic Dialects , 2005, SEMITIC@ACL.

[20]  Kevin Daimi,et al.  Identifying Syntactic Ambiguities in Single-Parse Arabic Sentence , 2001, Comput. Humanit..

[21]  Eric Atwell,et al.  Comparative Evaluation of Arabic Language Morphological Analysers and Stemmers , 2008, COLING.