Reranking Hypotheses of Machine-Translated Queries for Cross-Lingual Information Retrieval

Machine Translation (MT) systems employed to translate queries for Cross-Lingual Information Retrieval typically produce a single translation with maximum translation quality. This, however, might not be optimal with respect to retrieval quality and other translation variants might lead to better retrieval results. In this paper, we explore a method using multiple translations produced by an MT system, which are reranked using a supervised machine-learning method trained to directly optimize retrieval quality. We experiment with various types of features and the results obtained on the medical-domain test collection from the CLEF eHealth Lab series show significant improvement of retrieval quality compared to a system using single translation provided by MT.

[1]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[2]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[3]  Craig MacDonald,et al.  University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming , 2005, CLEF.

[4]  Jimmy J. Lin,et al.  Looking inside the box: context-sensitive translation for cross-language information retrieval , 2012, SIGIR '12.

[5]  Douglas W. Oard,et al.  Probabilistic structured query methods , 2003, SIGIR.

[6]  P L Schuyler,et al.  The UMLS Metathesaurus: representing different views of biomedical concepts. , 1993, Bulletin of the Medical Library Association.

[7]  J. Scott McCarley Should we Translate the Documents or the Queries in Cross-language Information Retrieval? , 1999, ACL.

[8]  David A. Hull Using Structured Queries for Disambiguation in Cross-Language Information Retrieval , 1997 .

[9]  Douglas W. Oard,et al.  A comparative study of query and document translation for cross-language information retrieval , 1998, AMTA.

[10]  Tetsuya Ishikawa,et al.  Applying Machine Translation to Two-Stage Cross-Language Information , 2000, AMTA.

[11]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[12]  Xiaojie Liu,et al.  Bridging Layperson's Queries with Medical Concepts- GRIUM@CLEF2015 eHealth Task 2 , 2015, CLEF.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[15]  Ferhan Türe,et al.  Learning to Translate: A Query-Specific Combination Approach for Cross-Lingual Information Retrieval , 2014, EMNLP.

[16]  Stefan Riezler,et al.  Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings , 2013, EMNLP.

[17]  Cyril Grouin,et al.  Overview of the CLEF eHealth Evaluation Lab 2015 , 2015, CLEF.

[18]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  P. McCullagh,et al.  An outline of generalized linear models , 1983 .

[21]  Stefan Riezler,et al.  Learning to translate queries for CLIR , 2014, SIGIR.

[22]  Craig MacDonald,et al.  Terrier Information Retrieval Platform , 2005, ECIR.

[23]  Gareth J. F. Jones,et al.  ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred Health Information Retrieval , 2014, CLEF.

[24]  Ondrej Dusek,et al.  Machine Translation of Medical Texts in the Khresmoi Project , 2014, WMT@ACL.

[25]  Djoerd Hiemstra,et al.  Disambiguation Strategies for Cross-Language Information Retrieval , 1999, ECDL.

[26]  Christof Monz,et al.  Adaptation of Statistical Machine Translation Model for Cross-Lingual Information Retrieval in a Service Context , 2012, EACL.

[27]  James Allan,et al.  An Investigation of Dirichlet Prior Smoothing's Performance Advantage , 2005 .

[28]  Iryna Gurevych,et al.  Combining Query Translation Techniques to Improve Cross-Language Information Retrieval , 2011, ECIR.

[29]  Gareth J. F. Jones,et al.  Adaptation of machine translation for multilingual information retrieval in the medical domain , 2014, Artif. Intell. Medicine.

[30]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[31]  Jinwook Choi,et al.  Exploring Effective Information Retrieval Technique for the Medical Web Documents: SNUMedinfo at CLEFeHealth2014 Task 3 , 2014, CLEF.

[32]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..