Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation

We present a simple yet powerful data augmentation method for boosting Neural Machine Translation (NMT) performance by leveraging information retrieved from a Translation Memory (TM). We propose and test two methods for augmenting NMT training data with fuzzy TM matches. Tests on the DGT-TM data set for two language pairs show consistent and substantial improvements over a range of baseline systems. The results suggest that this method is promising for any translation environment in which a sizeable TM is available and a certain amount of repetition across translations is to be expected, especially considering its ease of implementation.

[1]  Yang Feng,et al.  Memory-augmented Neural Machine Translation , 2017, EMNLP.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[4]  Chengqing Zong,et al.  Integrating Translation Memory into Phrase-Based Machine Translation during Decoding , 2013, ACL.

[5]  P. Isabelle,et al.  Phrase-based Machine Translation in a Computer-assisted Translation Environment , 2009, MTSUMMIT.

[6]  Philipp Koehn,et al.  Fast Approximate String Matching with Suffix Arrays and A* Parsing , 2010, AMTA.

[7]  Tom Vanallemeersch,et al.  M3TRA: integrating TM and MT for professional translators , 2018, EAMT.

[8]  Alex Waibel,et al.  Augmenting a statistical translation system with a translation memory , 2005, EAMT.

[9]  Philippe Langlais,et al.  Sub-sentential exploitation of translation memories , 2001, MTSUMMIT.

[10]  Marcello Federico Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation , 2012, AMTA.

[11]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[12]  Satoshi Nakamura,et al.  Guiding Neural Machine Translation with Retrieved Translation Pieces , 2018, NAACL.

[13]  Heikki Hyyrö Explaining and Extending the Bit-parallel Approximate String Matching Algorithm of Myers , 2001 .

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Chris Hokamp,et al.  Ensembling Factored Neural Machine Translation Models for Automatic Post-Editing and Quality Estimation , 2017, WMT.

[16]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[17]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[18]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[19]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[20]  Tom Vanallemeersch,et al.  Assessing linguistically aware fuzzy matching in translation memories , 2015, EAMT.

[21]  Marc Dymetman,et al.  Dynamic Translation Memory: Using Statistical Machine Translation to Improve Translation Memory Fuzzy Matches , 2008, CICLing.

[22]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[23]  Andreas Eisele,et al.  DGT-TM: A freely available Translation Memory in 22 languages , 2012, LREC.

[24]  Philipp Koehn,et al.  462 Machine Translation Systems for Europe , 2009, MTSUMMIT.

[25]  Philipp Koehn,et al.  Convergence of Translation Memory and Statistical Machine Translation , 2010, JEC.

[26]  Sadao Kurohashi,et al.  Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages , 2017, MTSUMMIT.

[27]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[28]  Michael Bloodgood,et al.  Translation memory retrieval methods , 2014, EACL.

[29]  Deyi Xiong,et al.  Encoding Gated Translation Memory into Neural Machine Translation , 2018, EMNLP.

[30]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[31]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[32]  Yong Wang,et al.  Search Engine Guided Neural Machine Translation , 2018, AAAI.

[33]  John E. Ortega,et al.  Fuzzy-match repair using black-box machine translation systems: what can be expected? , 2016, AMTA.

[34]  Caroline Rossi,et al.  Uses and perceptions of Machine Translation at the European Commission , 2019 .

[35]  Carlos S. C. Teixeira,et al.  Resistance and accommodation: factors for the (non-) adoption of machine translation among professional translators , 2017 .