Different approaches to Cross Language Information Retrieval

This paper describes two experiments in the domain of Cross Language Information Retrieval. Our basic approach is to translate queries word by word using machine readable dictionaries. The first experiment compared different strategies to deal with word sense ambiguity: i) keeping all translations and integrate translation probabilities in the model, ii) a single translation is selected on the basis of the number of occurrences in the dictionary iii) word by word translation after word sense disambiguation in the source language. In a second experiment we constructed parallel corpora from web documents in order to construct bilingual dictionaries or improve translation probability estimates. We conclude that our best dictionary based CLIR approach is based on keeping all possible translations, not by simple substitution of a query term by its translations but by creating a structured query and including reverse translation probabilities in the retrieval model.

[1]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[2]  Martin Braschler,et al.  Experiments with the Eurospider Retrieval System for CLEF 2001 , 2000, CLEF.

[3]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[4]  Djoerd Hiemstra,et al.  Twenty-One at CLEF-2000: Translation Resources, Merging Strategies and Relevance Feedback , 2000, CLEF.

[5]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[6]  Salim Roukos,et al.  Ad hoc and Multilingual Information Retrieval at IBM , 1998, TREC.

[7]  Douglas W. Oard,et al.  Structured translation for cross-language information retrieval , 2000, SIGIR '00.

[8]  Djoerd Hiemstra,et al.  Translation Resources, Merging Strategies, and Relevance Feedback for Cross-Language Information Retrieval , 2000, CLEF.

[9]  David A. Hull Using Structured Queries for Disambiguation in Cross-Language Information Retrieval , 1997 .

[10]  Douglas W. Oard,et al.  Alternative Approaches for Cross-Language Text Retrieval , 1997 .

[11]  Carol Peters,et al.  Cross-Language Information Retrieval (CLIR) Track Overview , 1997, TREC.

[12]  Ari Pirkola,et al.  The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[13]  Martin Braschler,et al.  Experiments with the Eurospider Retrieval System for CLEF 2000 , 2000, CLEF.

[14]  Djoerd Hiemstra,et al.  Twenty-One at TREC-8: using Language Technology for Information Retrieval , 1999, TREC.