Automatic dictionary extraction for cross-language information retrieval

In experiments comparing a variety of different methods for cross-language information retrieval using a bilingual training corpus—methods based on both machine translation and “traditional” information-retrieval techniques—a fairly simple statistical technique for automatically extracting a bilingual dictionary from parallel text proved to have the best performance. Surprisingly, an improvement to the dictionary extraction method that significantly increases the accuracy of the dictionary proved to be slightly detrimental to overall performance even though it is highly beneficial for other applications. This chapter will describe the extraction method and its enhancement in detail, and compare the performance of a retrieval system using the automatically-generated dictionaries with other retrieval methods.

[1]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[2]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  Multilingual Text Resources at the Linguistic Data Consortium , 1994, HLT.

[5]  Sergei Nirenburg,et al.  Integrating Translations from Multiple Sources within the PANGLOSS Mark III Machine Translation System , 1994, AMTA.

[6]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[7]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[8]  Mark W. Davis,et al.  A TREC Evaluation of Query Translation Methods For Multi-Lingual Text Retrieval , 1995, TREC.

[9]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[10]  Padmini Srinivasan,et al.  Optimal Document-Indexing Vocabulary for MEDLINE , 1996, Inf. Process. Manag..

[11]  Ralf D. Brown,et al.  Example-Based Machine Translation in the Pangloss System , 1996, COLING.

[12]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[13]  Yiming Yang,et al.  Translingual Information Retrieval: A Comparative Evaluation , 1997, IJCAI.

[14]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[15]  I. Dan Melamed A Word-to-Word Model of Translational Equivalence , 1997, ACL.

[16]  Gerard Salton,et al.  Improving Retrieval Performance by Relevance Feedback , 1997 .

[17]  Yiming Yang,et al.  Translingual Information Retrieval: Learning from Bilingual Corpora , 1998, Artif. Intell..

[18]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[19]  Éric Gaussier Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora , 1998, COLING-ACL.