Monlingual document retrieval: English versus other European languages

The vast majority of research in information retrieval is done using English collections and topics. This raises questions about the effectiveness of retrieval strategies for other languages. To examine this issue, we focus on document retrieval in nine European languages. In particular, we investigate the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding of language-independent approaches, such as character n-gramming; and of the combination of the two types ofapproaches. The experimental evidence is obtained using the 2003 test-suite of the cross-language evaluation forum (CLEF).

[1]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  T. de Heer The application of the concept of homeosemy to natural language information retrieval , 1982, Inf. Process. Manag..

[4]  W. John Wilbur,et al.  Non-parametric significance tests of retrieval performance comparisons , 1994, J. Inf. Sci..

[5]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[6]  Maarten de Rijke,et al.  Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian , 2001, CLEF.

[7]  Maarten de Rijke,et al.  Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval , 2003, CLEF.

[8]  Yunheng Ji MORPHOLOGY , 1937, A Grammar of Italian Sign Language (LIS).

[9]  Jacques Savoy,et al.  Statistical inference in retrieval effectiveness evaluation , 1997, Inf. Process. Manag..

[10]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[11]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[12]  M. de Rijke,et al.  Monolingual Document Retrieval for European Languages , 2004, Information Retrieval.

[13]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[14]  David A. Hull Stemming Algorithms: A Case Study for Detailed Evaluation , 1996, J. Am. Soc. Inf. Sci..

[15]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[16]  Donna Harman,et al.  How effective is suffixing , 1991 .

[17]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.