Domain Differential Adaptation for Neural Machine Translation

Neural networks are known to be data hungry and domain sensitive, but it is nearly impossible to obtain large quantities of labeled data for every domain we are interested in. This necessitates the use of domain adaptation strategies. One common strategy encourages generalization by aligning the global distribution statistics between source and target domains, but one drawback is that the statistics of different domains or tasks are inherently divergent, and smoothing over these differences can lead to sub-optimal performance. In this paper, we propose the framework of {\it Domain Differential Adaptation (DDA)}, where instead of smoothing over these differences we embrace them, directly modeling the difference between domains using models in a related task. We then use these learned domain differentials to adapt models for the target task accordingly. Experimental results on domain adaptation for neural machine translation demonstrate the effectiveness of this strategy, achieving consistent improvements over other alternative adaptation strategies in multiple experimental settings.

[1]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[2]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[3]  Graham Neubig,et al.  Controllable Invariance through Adversarial Feature Learning , 2017, NIPS.

[4]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[5]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[6]  Kevin Duh,et al.  Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation , 2013, ACL.

[7]  Marcello Federico,et al.  Multi-Domain Neural Machine Translation through Unsupervised Adaptation , 2017, WMT.

[8]  Qiang Yang,et al.  Distant Domain Transfer Learning , 2017, AAAI.

[9]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[10]  Graham Neubig,et al.  compare-mt: A Tool for Holistic Comparison of Language Generation Systems , 2019, NAACL.

[11]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[12]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[13]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[14]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Jaime Carbonell,et al.  Domain Adaptation of Neural Machine Translation by Lexicon Induction , 2019, ACL.

[18]  Markus Freitag,et al.  Fast Domain Adaptation for Neural Machine Translation , 2016, ArXiv.

[19]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[20]  Yu Zhang,et al.  Transfer Learning via Learning to Transfer , 2018, ICML.

[21]  Yiming Wang,et al.  Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.

[22]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[23]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[24]  Ankur Bapna,et al.  Non-Parametric Adaptation for Neural Machine Translation , 2019, NAACL.

[25]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[26]  Angeliki Metallinou,et al.  Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents , 2018, AAAI.

[27]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[28]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Carlos D. Castillo,et al.  Generate to Adapt: Aligning Domains Using Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[34]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[35]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[36]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[39]  Kenneth Heafield,et al.  Copied Monolingual Data Improves Low-Resource Neural Machine Translation , 2017, WMT.

[40]  Lemao Liu,et al.  Instance Weighting for Neural Machine Translation Domain Adaptation , 2017, EMNLP.