Flat vs. hierarchical phrase-based translation models for cross-language information retrieval

Although context-independent word-based approaches remain popular for cross-language information retrieval, many recent studies have shown that integrating insights from modern statistical machine translation systems can lead to substantial improvements in effectiveness. In this paper, we compare flat and hierarchical phrase-based translation models for query translation. Both approaches yield significantly better results than either a token-based or a one-best translation baseline on standard test collections. The choice of model manifests interesting tradeoffs in terms of effectiveness, efficiency, and model compactness.

[1]  Jacob Andreas,et al.  Semantics-Based Machine Translation with Hyperedge Replacement Grammars , 2012, COLING.

[2]  Douglas W. Oard,et al.  Probabilistic structured query methods , 2003, SIGIR.

[3]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[4]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[5]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[6]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[7]  Wessel Kraaij,et al.  Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval , 2003, CL.

[8]  Dekai Wu,et al.  A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[9]  Jinxi Xu,et al.  Empirical studies on the impact of lexical resources on CLIR performance , 2005, Inf. Process. Manag..

[10]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[11]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[12]  Wei Zhang,et al.  Recognition and classification of noun phrases in queries for effective retrieval , 2007, CIKM '07.

[13]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[14]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[15]  C. J. van Rijsbergen,et al.  Phrase Identification in Cross-Language Information Retrieval , 2000, RIAO.

[16]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[17]  Marcello Federico,et al.  Statistical cross-language information retrieval using n-best query translations , 2002, SIGIR '02.

[18]  Hae-Chang Rim,et al.  Improving query translation in English-Korean cross-language information retrieval , 2005, Inf. Process. Manag..

[19]  Yi Liu,et al.  A maximum coherence model for dictionary-based cross-language information retrieval , 2005, SIGIR '05.

[20]  Jimmy J. Lin,et al.  Looking inside the box: context-sensitive translation for cross-language information retrieval , 2012, SIGIR '12.

[21]  Ferhan Ture,et al.  Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation , 2013 .

[22]  Jianfeng Gao,et al.  Statistical query translation models for cross-language information retrieval , 2006, TALIP.

[23]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[24]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[25]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[26]  Christof Monz,et al.  Adaptation of Statistical Machine Translation Model for Cross-Lingual Information Retrieval in a Service Context , 2012, EACL.

[27]  Walid Magdy,et al.  Should MT Systems Be Used as Black Boxes in CLIR? , 2011, ECIR.