A new stemmer for Farsi language

In this paper, we report on the design and implementation of a stemmer for the Farsi language, according to combination of Kazem Taghva's method and improved Krovetz's method. The first method removes the suffixes and prefixes according to the word's structure. And the second method is based on saving the information in a Database. This paper reports a kind of combination of these methods. The results of our evaluation on a small Farsi document collection show a significant improvement in precision/recall.

[1]  Carol Peters What Happened in CLEF 2007 , 2007, CLEF.

[2]  Martti Juhola,et al.  Stemming and lemmatization in the clustering of finnish text documents , 2004, CIKM '04.

[3]  Kazem Taghva,et al.  A stemming algorithm for the Farsi language , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[4]  Jacques Savoy,et al.  Persian Language, Is Stemming Efficient? , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[5]  David A. Hull Stemming Algorithms: A Case Study for Detailed Evaluation , 1996, J. Am. Soc. Inf. Sci..

[6]  Eiman Tamah Al-Shammari Towards an Error-Free Stemming , 2008, IADIS European Conf. Data Mining.

[7]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[8]  Kazem Taghva,et al.  Farsi searching and display technologies , 2003 .

[9]  Stephen Tomlinson,et al.  Lexical and Algorithmic Stemming Compared for 9 European Languages with Hummingbird SearchServerTM at CLEF 2003 , 2003, CLEF.

[10]  W. Bruce Croft,et al.  Corpus-based stemming using cooccurrence of word variants , 1998, TOIS.

[11]  Mehrnoush Shamsfard,et al.  A Bottom Up approach to Persian Stemming , 2008, IJCNLP.

[12]  Kashif Riaz,et al.  Challenges in Urdu stemming: a progress report , 2007 .

[13]  Eneko Agirre,et al.  Advances in Multilingual and Multimodal Information Retrieval. , 2008 .

[14]  Jacques Savoy Stemming of French words based on grammatical categories , 1993 .

[15]  Jacques Savoy,et al.  Stemming of French Words Based on Grammatical Categories , 1993, J. Am. Soc. Inf. Sci..