Assessing Automatically Extracted Bilingual Lexicons for CLIR in Vertical Domains: XRCE Participation in the GIRT Track of CLEF 2002

In this paper, we describe the approach we used in the Cross-Language Evaluation Forum CLEF 2002, and more specifically in the GIRT Task. The approach is based on (1) the extraction of two bilingual lexicons, one from parallel corpora and the other one from comparable corpora, (2) the optimal combination of these bilingual lexicons for Cross-Language Information Retrieval and (3) the combination with monolingual IR on parallel corpora. While our original submission to CLEF2002 was restricted to short queries (using only the title field), we present here the results extended to complete queries.

[1]  Jean Véronis,et al.  Parallel Text Processing , 2000 .

[2]  David A. Hull Automating the construction of bilingual terminology lexicons , 1997 .

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[5]  Kumiko Tanaka-Ishii,et al.  Extraction of Lexical Translations from Non-Aligned Corpora , 1996, COLING.

[6]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Éric Gaussier Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora , 1998, COLING-ACL.

[9]  Robert L. Mercer,et al.  Aligning Sentences in Parallel Corpora , 1991, ACL.

[10]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[11]  I. Dan Melamed A Word-to-Word Model of Translational Equivalence , 1997, ACL.

[12]  Martin Kay,et al.  Text-Translation Alignment , 1993, Comput. Linguistics.

[13]  Djoerd Hiemstra,et al.  Using statistical methods to create a bilingual dictionary , 1996 .

[14]  Pascale Fung,et al.  A statistical view on bilingual lexicon extraction , 1998, AMTA.

[15]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[16]  Carol Peters,et al.  CLEF 2002 Methodology and Metrics , 2002, CLEF.

[17]  Carol Peters,et al.  CLEF Methodology and Metrics , 2001, CLEF.