Extending memory-based machine translation to phrases

We present a phrase-based extension to memory-based machine translation. This form of example-based machine translation employs lazy-learning classifiers to translate fragments of the source sentence to fragments of the target sentence. Source-side fragments consist of variable-length phrases in a local context of neighboring words, translated by the classifier to a target-language phrase. We compare three methods of phrase extraction, and present a new decoder that reassembles the translated fragments into one final translation. Results show that one of the proposed phrase-extraction methods—the one used in Moses—leads to a translation system that outperforms context-sensitive word-based approaches. The differences, however, are small, arguably because the word-based approaches already capture phrasal context implicitly due to their source-side and target-side context sensitivity.

[1]  Thomas R. G. Green,et al.  The necessity of syntax markers: Two experiments with artificial languages , 1979 .

[2]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[3]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4]  Ulrich Germann,et al.  Greedy Decoding for Statistical Machine Translation in Almost Linear Time , 2003, NAACL.

[5]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[6]  Andy Way,et al.  Robust large-scale EBMT with marker-based segmentation , 2004, TMI.

[7]  Harold L. Somers,et al.  Review Article: Example-based Machine Translation , 1999, Machine Translation.

[8]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[9]  Jörg Tiedemann,et al.  The OPUS corpus : parallel and free , 2004 .

[10]  Walter Daelemans,et al.  IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[11]  ANDY WAY,et al.  Comparing example-based and statistical machine translation , 2005, Nat. Lang. Eng..

[12]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[13]  Andy Way,et al.  A memory-based classification approach to marker-based EBMT , 2007 .

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Antal van den Bosch,et al.  Memory-Based Machine Translation and Language Modeling , 2009, Prague Bull. Math. Linguistics.

[16]  Antal van den Bosch,et al.  A Constraint Satisfaction Approach to Machine Translation , 2009, EAMT.