Building a syntactic rules-based stemmer to improve search effectiveness for arabic language

Nowadays, The world is experiencing a huge growth in the volume of exchanged texts, which makes some of it untapped. Text Mining is the set of techniques that analyze these large masses of information, extract relations that can be unknown beforehand and provide solutions that help decision making. In this sense, stemming is a common requirement of these techniques. It includes reducing different grammatical forms of a word and bringing them to a common base form. In what follows, we will discuss these treatment methods for arabic text, show their limits and provide new algorithm to improve them.

[1]  Anjali Ganesh Jivani,et al.  A Comparative Study of Stemming Algorithms , 2011 .

[2]  Rehab Duwairi,et al.  Arabic Text Categorization , 2007, Int. Arab J. Inf. Technol..

[3]  Rehab Duwairi,et al.  Educative and Adaptive System for Personalized Learning: Learning Styles and Content Adaptation , 2007 .

[4]  Ophir Frieder,et al.  On arabic search: improving the retrieval effectiveness via a light stemming approach , 2002, CIKM '02.

[5]  Andreas Nürnberger,et al.  A web statistics based conflation approach to improve Arabic text retrieval , 2011, 2011 Federated Conference on Computer Science and Information Systems (FedCSIS).

[6]  Nayer M. Wanas,et al.  A Study of Text Preprocessing Tools for Arabic Text Categorization , 2009 .

[7]  Falk Scholer,et al.  Stemming Arabic Conjunctions and Prepositions , 2005, SPIRE.

[8]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[9]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[10]  Chris D. Paice An evaluation method for stemming algorithms , 1994, SIGIR '94.

[11]  Riyad Al-Shalabi,et al.  A Computational Morphology System for Arabic , 1998, SEMITIC@COLING.

[12]  Eric Atwell,et al.  Comparative Evaluation of Arabic Language Morphological Analysers and Stemmers , 2008, COLING.

[13]  W. Ashour,et al.  Arabic Morphological Tools for Text Mining , 2010 .

[14]  Massimo Poesio,et al.  Identifying Broken Plurals in Unvowelised Arabic Tex , 2004, EMNLP.

[15]  S. Khoja,et al.  APT: Arabic Part-of-speech Tagger , 2001 .

[16]  Fredric C. Gey,et al.  Building an Arabic Stemmer for Information Retrieval , 2002, TREC.

[17]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[18]  R. Al Shalabi,et al.  New approach for extracting Arabic roots , 2003 .

[19]  Riyad Al-Shalabi,et al.  Building an effective rule-based light stemmer for Arabic language to inprove search effectiveness , 2008, 2008 International Conference on Innovations in Information Technology.

[20]  Roger Garside,et al.  An Arabic tagset for the morphosyntactic tagging of Arabic , 2001 .

[21]  Haidar M. Harmanani,et al.  A Rule-Based Extensible Stemmer for Information Retrieval with Application to Arabic , 2006, Int. Arab J. Inf. Technol..

[22]  P. Lewis Ethnologue : languages of the world , 2009 .