Transductive Data-Selection Algorithms for Fine-Tuning Neural Machine Translation

Machine Translation models are trained to translate a variety of documents from one language into another. However, models specifically trained for a particular characteristics of the documents tend to perform better. Fine-tuning is a technique for adapting an NMT model to some domain. In this work, we want to use this technique to adapt the model to a given test set. In particular, we are using transductive data selection algorithms which take advantage the information of the test set to retrieve sentences from a larger parallel set. In cases where the model is available at translation time (when the test set is provided), it can be adapted with a small subset of data, thereby achieving better performance than a generic model or a domain-adapted model.

[1]  Christof Monz,et al.  Dynamic Data Selection for Neural Machine Translation , 2017, EMNLP.

[2]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3]  Andy Way,et al.  Feature Decay Algorithms for Neural Machine Translation , 2018, EAMT.

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Andy Way,et al.  Data Selection with Feature Decay Algorithms Using an Approximated Target Side , 2018, IWSLT.

[6]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[7]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[8]  Zuzanna Parcheta,et al.  Data selection for NMT using Infrequent n-gram Recovery , 2018 .

[9]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[10]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[11]  Ergun Biçici,et al.  Feature Decay Algorithms for Fast Deployment of Accurate Statistical Machine Translation Systems , 2013, WMT@ACL.

[12]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[13]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[14]  Jiajun Zhang,et al.  One Sentence One Model for Neural Machine Translation , 2018, LREC.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Andy Way,et al.  Adapting NMT to caption translation inWikimedia Commons for low-resource languages , 2019, Proces. del Leng. Natural.

[17]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[18]  Markus Freitag,et al.  Fast Domain Adaptation for Neural Machine Translation , 2016, ArXiv.

[19]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[20]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[21]  Germán Sanchis-Trilles,et al.  Does more data always yield better translations? , 2012, EACL.

[22]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[23]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[24]  Andy Way,et al.  Extending Feature Decay Algorithms Using Alignment Entropy , 2016, FETLT.

[25]  Hayder Radha,et al.  Survey of data-selection methods in statistical machine translation , 2015, Machine Translation.

[26]  Andy Way,et al.  Adaptation of Machine Translation Models with Back-translated Data using Transductive Data Selection Methods , 2019, CICLing.

[27]  Andy Way,et al.  Extracting In-domain Training Corpora for Neural Machine Translation Using Data Selection Methods , 2018, WMT.

[28]  Mo Yu,et al.  Locally Training the Log-Linear Model for SMT , 2012, EMNLP.

[29]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[30]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[31]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[32]  Masao Utiyama,et al.  Two methods for stabilizing MERT: NICT at IWSLT 2009 , 2009, IWSLT.

[33]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[34]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[35]  Karin M. Verspoor,et al.  Findings of the WMT 2017 Biomedical Translation Shared Task , 2017, WMT.

[36]  Andy Way,et al.  Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms , 2017, Prague Bull. Math. Linguistics.

[37]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Andy Way,et al.  The ADAPT System Description for the IWSLT 2018 Basque to English Translation Task , 2018, IWSLT.

[40]  Francisco Casacuberta,et al.  Adapting Neural Machine Translation with Parallel Synthetic Data , 2017, WMT.

[41]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[42]  Deniz Yuret,et al.  Instance Selection for Machine Translation using Feature Decay Algorithms , 2011, WMT@EMNLP.