Wrapper Syntax for Example-Based Machine Translation

TransBooster is a wrapper technology designed to improve the performance of wide-coverage machine translation systems. Using linguistically motivated syntactic information, it automatically decomposes source language sentences into shorter and syntactically simpler chunks, and recomposes their translation to form target language sentences. This generally improves both the word order and lexical selection of the translation. To date, TransBooster has been successfully applied to rule-based MT, statistical MT, and multi-engine MT. This paper presents the application of TransBooster to Example-Based Machine Translation. In an experiment conducted on test sets extracted from Europarl and the Penn II Treebank we show that our method can raise the BLEU score up to 3.8% relative to the EBMT baseline. We also conduct a manual evaluation, showing that TransBooster-enhanced EBMT produces a better output in terms of fluency than the baseline EBMT in 55% of the cases and in terms of accuracy in 53% of the cases.

[1]  Andy Way,et al.  Robust large-scale EBMT with marker-based segmentation , 2004, TMI.

[2]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[3]  Ralf D. Brown,et al.  Adding linguistic knowledge to a lexical example-based translation system , 1999, TMI.

[4]  Julia Hockenmaier Parsing with Generative Models of Predicate-Argument Structure , 2003, ACL.

[5]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[6]  Andy Way,et al.  A Syntactic Skeleton for Statistical Machine Translation , 2006, EAMT.

[7]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[8]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[9]  Sadao Kurohashi,et al.  Finding Translation Patterns from Paired Source and Target Dependency Structures , 2003 .

[10]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[11]  Andy Way,et al.  TransBooster: boosting the performance of wide-coverage machine translation systems , 2005, EAMT.

[12]  P. Smith,et al.  Two Experiments with Artificial Languages , 1970 .

[13]  Andy Way,et al.  MaTrEx: machine translation using examples , 2006 .

[14]  Nano Gough,et al.  Example-based machine translation using the marker hypothesis , 2005 .

[15]  Andy Way,et al.  Seeing the wood for the trees: data-oriented translation , 2003, MTSUMMIT.

[16]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[17]  Andy Way,et al.  Improving online machine translation systems , 2005 .

[18]  Thomas R. G. Green,et al.  The necessity of syntax markers: Two experiments with artificial languages , 1979 .