Deriving a Bilingual Lexicon for Cross-Language Information Retrieval

In this paper we describe a systematic approach to derive a bilingual lexicon automatically from parallel corpora. Following this approach, a lexicon was derived from the English and Dutch version of the Agenda 21 corpus. With the lexicon and a part of the corpus that was not used to derive the lexicon, a bilingual retrieval environment was build. Recall and precision of monolingual (Dutch) retrieval was compared to recall and precision of bilingual (Dutch-to-English) retrieval. An experiment was conducted with the help of eight naive users who formulated queries and judged the relevance of retrieved fragments. The experiment shows 78% precision and 51% relative recall of monolingual retrieval, against 67% precision and 82% relative recall of bilingual retrieval.