Word-based Domain Adaptation for Neural Machine Translation

In this paper, we empirically investigate applying word-level weights to adapt neural machine translation to e-commerce domains, where small e-commerce datasets and large out-of-domain datasets are available. In order to mine in-domain like words in the out-of-domain datasets, we compute word weights by using a domain-specific and a non-domain-specific language model followed by smoothing and binary quantization. The baseline model is trained on mixed in-domain and out-of-domain datasets. Experimental results on English to Chinese e-commerce domain translation show that compared to continuing training without word weights, it improves MT quality by up to 2.11% BLEU absolute and 1.59% TER. We have also trained models using fine-tuning on the in-domain data. Pre-training a model with word weights improves fine-tuning up to 1.24% BLEU absolute and 1.64% TER, respectively.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[3]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[4]  Lemao Liu,et al.  Instance Weighting for Neural Machine Translation Domain Adaptation , 2017, EMNLP.

[5]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[6]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[7]  Shahram Khadivi,et al.  Learning from Chunk-based Feedback in Neural Machine Translation , 2018, ACL.

[8]  Kevin Duh,et al.  Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation , 2013, ACL.

[9]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[10]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Tanja Schultz,et al.  Bilingual-LSA Based LM Adaptation for Spoken Language Translation , 2007, ACL.

[13]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[14]  Holger Schwenk,et al.  Investigations on large-scale lightly-supervised training for statistical machine translation. , 2008, IWSLT.

[15]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[16]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[17]  George F. Foster,et al.  Cost Weighting for Neural Machine Translation Domain Adaptation , 2017, NMT@ACL.

[18]  Hai Zhao,et al.  Attention Is All You Need for Chinese Word Segmentation , 2019, EMNLP.

[19]  Markus Freitag,et al.  Fast Domain Adaptation for Neural Machine Translation , 2016, ArXiv.

[20]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[21]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[22]  Spyridon Matsoukas,et al.  Discriminative Corpus Weight Estimation for Machine Translation , 2009, EMNLP.

[23]  Shahram Khadivi,et al.  Neural and Statistical Methods for Leveraging Meta-information in Machine Translation , 2017, MTSUMMIT.

[24]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[25]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[26]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[27]  Premkumar Natarajan,et al.  Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation , 2013, ACL.

[28]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[29]  Shafiq R. Joty,et al.  Using joint models or domain adaptation in statistical machine translation , 2015, MTSUMMIT.