Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian

This paper describes the experiments of our team for CLEF 2001, which include both official and post-submission runs. We took part in the monolingual task for Dutch, German, and Italian. The focus of our experiments was on the effects of morphological analyses, such as stemming and compound splitting, on retrieval effectiveness. Confirming earlier reports on retrieval in compound splitting languages such as Dutch and German, we found improvements to be around 25% for German and as much as 69% for Dutch. For Italian, lexicon-based stemming resulted in gains of up to 25%.

[1]  Ren'ee Pohlmann Wessel Kraaij Improving the Precision of a Text Retrieval System with Compound Analysis , 1996 .

[2]  Gerard Salton,et al.  Automatic indexing , 1980, ACM '80.

[3]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[4]  J. Davenport Editor , 1960 .

[5]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[6]  Isabelle Moulinier,et al.  West Group at CLEF2000: Non-English Monolingual Retrieval , 2000, CLEF.

[7]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[8]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[9]  Paul Grebe,et al.  Duden Grammatik der deutschen Gegenwartssprache , 1973 .

[10]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[11]  Harold Borko,et al.  Automatic indexing , 1981, ACM '81.

[12]  Karen Sparck Jones Automatic Indexing; Progress in Documentation. , 1974 .

[13]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[14]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[15]  Martine Adda-Decker,et al.  MORPHOLOGICAL DECOMPOSITION FOR ASR IN GERMAN , 2000 .

[16]  Donna Harman,et al.  How effective is suffixing , 1991 .

[17]  Wessel Kraaij,et al.  Comparing the Effect of Syntactic vs. Statistical Phrase Indexing Strategies for Dutch , 1998, ECDL.

[18]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[19]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[20]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.