论文信息 - Estimating Translation Probabilities from the Web for Structured Queries on CLIR

Estimating Translation Probabilities from the Web for Structured Queries on CLIR

We present two methods for estimating replacement probabilities without using parallel corpora. The first method proposed exploits the possible translation probabilities latent in Machine Readable Dictionaries (MRD). The second method is more robust, and exploits context similarity-based techniques in order to estimate word translation probabilities using the Internet as a bilingual comparable corpus. The experiments show a statistically significant improvement over non weighted structured queries in terms of MAP by using the replacement probabilities obtained with the proposed methods. The context similarity-based method is the one that yields the most significant improvement.

Maddalen Lopez de Lacalle | Xabier Saralegi

[1] Douglas W. Oard,et al. Probabilistic structured query methods , 2003, SIGIR.

[2] Pascale Fung,et al. An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[3] Ari Pirkola,et al. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[4] W. Bruce Croft,et al. Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[5] D. Hiemstra,et al. Statistical Language Models and Information Retrieval: Natural Language Processing Really Meets Retrieval , 2001 .