论文信息 - A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL

A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL

A stemming is a technique used to reduce words to their root form, by removingderivational and inflectional affixes. The stemming is widely used in information retrieval tasks. Many researchers demonstrate that stemming improves the performance of informa tion retrieval systems. Porter stemmer is the most common algorithm for English stemming . However, this stemming algorithm has several drawbacks, since its simple rules cannot fully describe English morphology. Errors made by this stemmer may affect the information retrieval performance. The present paper proposesan improved version of the original Porter stemming algorithm for the English language. The proposed stemmer is evaluated using the error counting method. With this method, the performance of a stemmer is computed by calculating the number of understemming and overstemming errors. The obtained results show an improvement in stemming accuracy, compared with the original stemmer, but also compared to other stemmers such as Paice and Lovins stemmer s. We prove, in addition, that the new version of porter stemmer affects the information retrieval performance.

Wahiba Ben | Abdessalem Karaa

[1] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[2] Chris D. Paice. An evaluation method for stemming algorithms , 1994, SIGIR '94.

[3] Julie Beth Lovins,et al. Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[4] Chris D. Paice,et al. Another stemmer , 1990, SIGF.

[5] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[6] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[8] PantelPatrick,et al. From frequency to meaning , 2010 .