论文信息 - Porter's stemming algorithm for Dutch

Porter's stemming algorithm for Dutch

A stemming algorithm provides a simple means to enhance Recall in Text Retrieval systems. The paper describes the development of a Dutch version of the Porter stemming algorithm. The stemmer was evaluated using a method inspired by Paice (Paice, 1994). The evaluation method is based on a list of groups of morphologically related words. Ideally, each group must be stemmed to the same root. The result of applying the stemmer to these groups of words is used to calculate the Understemming and Overstemming Index. These parameters and the diversity of stem group categories that could be generated from the CELEX database enabled a careful analysis of the effects of each stemming rule. The testsuite is extremely fit for a qualitative comparison of different (versions of) stemmers.

Wessel Kraaij | Wessel Kraaij

[1] Werkgroep Frequentie-onderzoek van het Nederlands,et al. Woordfrequenties in geschreven en gesproken Nederlands , 1975 .

[2] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[3] Donna K. Harman,et al. How effective is suffixing? , 1991, J. Am. Soc. Inf. Sci..

[4] Peter Willett,et al. The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data , 1992, J. Am. Soc. Inf. Sci..

[5] Venkata Subramaniam,et al. Information Retrieval: Data Structures & Algorithms , 1992 .

[6] Robert Krovetz,et al. Viewing morphology as an inference process , 1993, Artif. Intell..

[7] Chris D. Paice. An evaluation method for stemming algorithms , 1994, SIGIR '94.

[8] R. H. Baayen,et al. The CELEX Lexical Database (CD-ROM) , 1996 .