TNO at CLEF-2001: Comparing Translation Resources

This paper describes the official runs of TNO TPD for CLEF-2001. We participated in the monolingual, bilingual and multilingual tasks. The main contribution of this paper is a systematic comparison of three types of translation resources for bilingual retrieval based on query translation. We compared several techniques based on machine readable dictionaries, statistical dictionaries generated from parallel corpora with a baseline of the Babelfish MT service, which is available on the web. The study showed that the topic set is too small to draw reliable conclusions. All three methods have the potential to reach about 90% of the monolingual baseline performance, but the effectiveness is not consistent across language pairs and topic collections. Because each of the individual methods are quite sensitive to missing translations, we tested a combination approach, which yielded consistent improvements up to 98% of the monolingual baseline.

[1]  Ellen M. Voorhees,et al.  The Eighth Text REtrieval Conference (TREC-8) , 2000 .

[2]  Jian-Yun Nie,et al.  Multilingual Information Retrieval Based on Parallel Texts from the Web , 2000, CLEF.

[3]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[4]  James Mayfield,et al.  A Language-Independent Approach to European Text Retrieval , 2000, CLEF.

[5]  Jian-Yun Nie,et al.  Using Parallel Web Pages for Multi-lingual IR , 2000, CLEF.

[6]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[7]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[8]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[9]  Djoerd Hiemstra,et al.  Twenty-One at CLEF-2000: Translation Resources, Merging Strategies and Relevance Feedback , 2000, CLEF.

[10]  Djoerd Hiemstra,et al.  Twenty-One at TREC-8: using Language Technology for Information Retrieval , 1999, TREC.

[11]  T. G. Vosse The Word Connection. Grammar-based Spelling Error Correction in Dutch , 1994 .

[12]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[13]  Salim Roukos,et al.  Ad hoc and Multilingual Information Retrieval at IBM , 1998, TREC.

[14]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[15]  Carol Peters,et al.  Cross-Language Information Retrieval and Evaluation , 2001, Lecture Notes in Computer Science.

[16]  Wessel Kraaij,et al.  Porter's stemming algorithm for Dutch , 1994 .

[17]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[18]  Djoerd Hiemstra,et al.  Translation Resources, Merging Strategies, and Relevance Feedback for Cross-Language Information Retrieval , 2000, CLEF.

[19]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.