Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A combined statistics-based and linguistics-based model to select best translation candidates to phrasal translation is proposed. Evaluations using a large test collection for Japanese-English revealed the proposed combination of bi-directional comparable corpora, bilingual dictionaries and transliteration, augmented with linguistics-based pruning to be highly effective in Cross-Language Information Retrieval.