Viewing stemming as recall enhancement

Previous research on stemming has shown both positive and negative effects on retrieval performance. This paper describes an experiment in which several linguistic and non-linguistic stemmers are evaluated on a Dutch test collection. Experiments especially focus on the measurement of Recall. Results show that linguistic stemming restricted to inflection yields a significant improvement over full linguistic and non-linquistic stemming, both in average Precision and R-Recall. Best results are obtained with a linguistic stemmer which is enhanced with compound analysis. This version has a significantly better Recall than a system without stemming, without a significant deterioration of Precision.

[1]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[2]  Jean M. Tague,et al.  The pragmatics of information retrieval experimentation , 1981 .

[3]  Peter Willett,et al.  The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data , 1992, J. Am. Soc. Inf. Sci..

[4]  James Blustein,et al.  A Statistical Analysis of the TREC-3 Data , 1995, TREC.

[5]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[6]  Wessel Kraaij,et al.  Using Linguistic Knowledge in Information Retrieval Technical Report , 1996 .

[7]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  Donna K. Harman,et al.  Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[10]  David A. Hull Stemming Algorithms: A Case Study for Detailed Evaluation , 1996, J. Am. Soc. Inf. Sci..

[11]  Peter Willett,et al.  The effectiveness of stemming for natural‐language access to Slovene textual data , 1992 .

[12]  Jean Tague-Sutcliffe,et al.  Measuring information : an information services perspective , 1995 .

[13]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[14]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[15]  Donna K. Harman,et al.  How effective is suffixing? , 1991, J. Am. Soc. Inf. Sci..

[16]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.

[17]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[18]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[19]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[20]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[21]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[22]  Wessel Kraaij,et al.  Porter's stemming algorithm for Dutch , 1994 .