Comparing Different Approaches to Treat Translation Ambiguity in CLIR: Structured Queries vs. Target Co-occurrence Based Selection

Two main problems in Cross-language Information Retrieval are translation selection and the treatment of out-of-vocabulary terms. In this paper, we will be focusing on the problem concerning the translation selection. Structured queries and target co-occurrence-based methods seem to be the most appropriate approaches when parallel corpora are not available. However, there is no comparative study. In this paper we compare the results obtained using each of the aforementioned methods, we specify the weaknesses of each method, and finally we propose a hybrid method to combine both. In terms of mean average precision, results for Basque-English cross-lingual retrieval show that structured queries are the best approach both with long queries and short queries.

[1]  Fredric C. Gey,et al.  Combining Query Translation and Document Translation in Cross-Language Retrieval , 2003, CLEF.

[2]  Sung-Hyon Myaeng,et al.  Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting , 1999, ACL.

[3]  Ari Pirkola,et al.  The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[4]  David A. Evans,et al.  The Effect of Pseudo Relevance Feedback on MT-Based CLIR , 2000, RIAO.

[5]  Tuomas Talvensaari Comparable Corpora in Cross-Language Information Retrieval , 2008 .

[6]  J. Scott McCarley Should we Translate the Documents or the Queries in Cross-language Information Retrieval? , 1999, ACL.

[7]  Douglas W. Oard,et al.  A comparative study of query and document translation for cross-language information retrieval , 1998, AMTA.

[8]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[9]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[10]  Yi Liu,et al.  A maximum coherence model for dictionary-based cross-language information retrieval , 2005, SIGIR '05.

[11]  James Allan,et al.  UMass at TREC 2002: Cross Language and Novelty Tracks , 2002, TREC.

[12]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[13]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[14]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[15]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[16]  Douglas W. Oard,et al.  Probabilistic structured query methods , 2003, SIGIR.

[17]  Jianfeng Gao,et al.  Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations , 2002, SIGIR '02.

[18]  Christof Monz,et al.  Iterative translation disambiguation for cross-language information retrieval , 2005, SIGIR '05.