Phrase-based Machine Translation in a Computer-assisted Translation Environment

We explore the problem of integrating a phrase-based MT system within a computerassisted translation (CAT) environment. We argue that one way of achieving successful integration is to design an MT system that behaves more like the translation memory (TM) component of CAT systems. This implies producing MT output that is consistent with that of a TM when high-similarity material exists in the training data; it also implies providing the MT system with a component that is capable of filtering out machine translations that are less likely to be useful. We propose solutions to both problems, and evaluate their impact on three different data sets. Our results indicate that the proposed approach leads to systems that produce better output than a TM, for a larger portion of the source text.

[1]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[2]  Hans Uszkoreit,et al.  Combining Multi-Engine Translations with Moses , 2009, WMT@EACL.

[3]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[4]  Marc Dymetman,et al.  Dynamic Translation Memory: Using Statistical Machine Translation to Improve Translation Memory Fuzzy Matches , 2008, CICLing.

[5]  George F. Foster,et al.  Confidence estimation for NLP applications , 2006, TSLP.

[6]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[7]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[8]  Marc Dymetman,et al.  Translating with Non-contiguous Phrases , 2005, HLT.

[9]  D. Bourigault,et al.  3 GTM : A Third-Generation Translation Memory , 2005 .

[10]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[11]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[12]  Philippe Langlais,et al.  Weather report translation using a translation memory , 2004, AMTA.

[13]  Tadashi Nomoto Multi-Engine Machine Translation with Voted Language Model , 2004, ACL.

[14]  Chris Quirk,et al.  Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[15]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[16]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[17]  Guy Lapalme,et al.  Text prediction for translators , 2002 .

[18]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[19]  Philippe Langlais,et al.  Sub-sentential exploitation of translation memories , 2001, MTSUMMIT.

[20]  Reinhard Schäler Beyond translation memories , 2001, MTSUMMIT.

[21]  Matthias Heyn Integrating Machine Translation into Translation Memory Systems , 1996, EAMT.