The effectiveness of stemming for natural‐language access to Slovene textual data

There have been several studies of the use of stemming algorithms for conflating morphological variants in free-text retrieval systems. Comparison of stemmed and nonconflated searches suggests that there are no significant increases in the effectiveness of retrieval when stemming is applied to English-language documents and queries. This article reports the use of stemming on Slovene-language documents and queries, and demonstrates that the use of an appropriate stemming algorithm results in a large, and statistically significant, increase in retrieval effectiveness when compared with nonconflated processing; similar comments apply to the use of manual, right-hand truncation. A comparison is made with stemming of English versions of the same documents and queries and it is concluded that the effectiveness of a stemming algorithm is determined by the morphological complexity of the language that it is designed to process. © 1992 John Wiley & Sons, Inc.