论文信息 - Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation - 字舞流文

Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation

We present a simple yet powerful data augmentation method for boosting Neural Machine Translation (NMT) performance by leveraging information retrieved from a Translation Memory (TM). We propose and test two methods for augmenting NMT training data with fuzzy TM matches. Tests on the DGT-TM data set for two language pairs show consistent and substantial improvements over a range of baseline systems. The results suggest that this method is promising for any translation environment in which a sizeable TM is available and a certain amount of repetition across translations is to be expected, especially considering its ease of implementation.

Arda Tezcan | Bram Bulté | Bram Bulté | Arda Tezcan

[1] Yang Feng,et al. Memory-augmented Neural Machine Translation , 2017, EMNLP.

[2] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[4] Chengqing Zong,et al. Integrating Translation Memory into Phrase-Based Machine Translation during Decoding , 2013, ACL.

[5] P. Isabelle,et al. Phrase-based Machine Translation in a Computer-assisted Translation Environment , 2009, MTSUMMIT.

[6] Philipp Koehn,et al. Fast Approximate String Matching with Suffix Arrays and A* Parsing , 2010, AMTA.

[7] Tom Vanallemeersch,et al. M3TRA: integrating TM and MT for professional translators , 2018, EAMT.

[8] Alex Waibel,et al. Augmenting a statistical translation system with a translation memory , 2005, EAMT.

[9] Philippe Langlais,et al. Sub-sentential exploitation of translation memories , 2001, MTSUMMIT.

[10] Marcello Federico. Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation , 2012, AMTA.

[11] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[12] Satoshi Nakamura,et al. Guiding Neural Machine Translation with Retrieved Translation Pieces , 2018, NAACL.

[13] Heikki Hyyrö. Explaining and Extending the Bit-parallel Approximate String Matching Algorithm of Myers , 2001 .

[14] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15] Chris Hokamp,et al. Ensembling Factored Neural Machine Translation Models for Automatic Post-Editing and Quality Estimation , 2017, WMT.

[16] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[17] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[18] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[19] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.

[20] Tom Vanallemeersch,et al. Assessing linguistically aware fuzzy matching in translation memories , 2015, EAMT.

[21] Marc Dymetman,et al. Dynamic Translation Memory: Using Statistical Machine Translation to Improve Translation Memory Fuzzy Matches , 2008, CICLing.

[22] Roberto J. Bayardo,et al. Scaling up all pairs similarity search , 2007, WWW '07.

[23] Andreas Eisele,et al. DGT-TM: A freely available Translation Memory in 22 languages , 2012, LREC.

[24] Philipp Koehn,et al. 462 Machine Translation Systems for Europe , 2009, MTSUMMIT.

[25] Philipp Koehn,et al. Convergence of Translation Memory and Statistical Machine Translation , 2010, JEC.

[26] Sadao Kurohashi,et al. Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages , 2017, MTSUMMIT.

[27] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[28] Michael Bloodgood,et al. Translation memory retrieval methods , 2014, EACL.

[29] Deyi Xiong,et al. Encoding Gated Translation Memory into Neural Machine Translation , 2018, EMNLP.

[30] Qun Liu,et al. Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[31] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[32] Yong Wang,et al. Search Engine Guided Neural Machine Translation , 2018, AAAI.

[33] John E. Ortega,et al. Fuzzy-match repair using black-box machine translation systems: what can be expected? , 2016, AMTA.

[34] Caroline Rossi,et al. Uses and perceptions of Machine Translation at the European Commission , 2019 .

[35] Carlos S. C. Teixeira,et al. Resistance and accommodation: factors for the (non-) adoption of machine translation among professional translators , 2017 .