A structural rule-based stemmer for Persian

This paper presents a new stemmer for Persian language. We used a structural approach for stemming which uses the structure of words and morphological rules of the language to recognize the stem of each word. We composed 33 rules to describe a structural rule-based stemmer. The rules are written based on the morphology of Persian language and its word derivation structure. For evaluation, we used our stemmer in an information retrieval system. The results demonstrated that by enhancing the system with this stemmer, the information retrieval system's precision increases, by the factor of 4.78% and the indexing file size decreases by the factor of 6%.

[1]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[2]  Mohammad Reza Meybodi,et al.  Bon: The Persian Stemmer , 2002, EurAsia-ICT.

[3]  Kazem Taghva,et al.  A stemming algorithm for the Farsi language , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[4]  Claire Fautsch,et al.  UniNE at CLEF 2009: Persian Ad Hoc Retrieval and IP , 2009, CLEF.

[5]  Farhad Oroumchian,et al.  Improving Persian Information Retrieval Systems Using Stemming and Part of Speech Tagging , 2008, CLEF.

[6]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..