Phrase-level Active Learning for Neural Machine Translation

Neural machine translation (NMT) is sensitive to domain shift. In this paper, we address this problem in an active learning setting where we can spend a given budget on translating in-domain data, and gradually fine-tune a pre-trained out-of-domain NMT model on the newly translated data. Existing active learning methods for NMT usually select sentences based on uncertainty scores, but these methods require costly translation of full sentences even when only one or two key phrases within the sentence are informative. To address this limitation, we re-examine previous work from the phrase-based machine translation (PBMT) era that selected not full sentences, but rather individual phrases. However, while incorporating these phrases into PBMT systems was relatively simple, it is less trivial for NMT systems, which need to be trained on full sequences to capture larger structural properties of sentences unique to the new domain. To overcome these hurdles, we propose to select both full sentences and individual phrases from unlabelled data in the new domain for routing to human translators. In a German-English translation task, our active learning approach achieves consistent improvements over uncertainty-based sentence selection methods, improving up to 1.2 BLEU score over strong active learning baselines.

[1]  Chris Callison-Burch,et al.  Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation , 2010, ACL.

[2]  Matthias Paulik,et al.  Empirical Evaluation of Active Learning Techniques for Neural MT , 2019, EMNLP.

[3]  Roee Aharoni,et al.  Unsupervised Domain Clusters in Pretrained Language Models , 2020, ACL.

[4]  Christof Monz,et al.  Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[5]  Rohit Prasad,et al.  A Semi-Supervised Batch-Mode Active Learning Strategy for Improved Statistical Machine Translation , 2010, CoNLL.

[6]  Colin Cherry,et al.  A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU , 2014, WMT@ACL.

[7]  Francisco Casacuberta,et al.  Active learning for interactive machine translation , 2012, EACL.

[8]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[9]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[10]  Jaime G. Carbonell,et al.  Active Learning in Example-Based Machine Translation , 2009, NODALIDA.

[11]  Qun Liu,et al.  Dictionary-based Data Augmentation for Cross-Domain Neural Machine Translation , 2020, ArXiv.

[12]  Alexander H. Waibel,et al.  Low Cost Portability for Statistical Machine Translation based on N-gram Frequency and TF-IDF , 2005, IWSLT.

[13]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[14]  Francisco Casacuberta,et al.  Active Learning for Interactive Neural Machine Translation of Data Streams , 2018, CoNLL.

[15]  Hal Daumé,et al.  Domain Adaptation for Machine Translation by Mining Unseen Words , 2011, ACL.

[16]  Satoshi Nakamura,et al.  Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation , 2016, NAACL.

[17]  Graham Neubig,et al.  SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation , 2018, EMNLP.

[18]  Sosuke Kobayashi,et al.  Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations , 2018, NAACL.

[19]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[20]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[21]  Stefan Riezler,et al.  Interactive-Predictive Neural Machine Translation through Reinforcement and Imitation , 2019, MTSummit.

[22]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Holger Schwenk,et al.  Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings , 2018, ACL.

[25]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[26]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[27]  Daniel Jurafsky,et al.  Data Noising as Smoothing in Neural Network Language Models , 2017, ICLR.

[28]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[31]  Pei Zhang,et al.  Active Learning for Neural Machine Translation , 2018, 2018 International Conference on Asian Language Processing (IALP).

[32]  Gholamreza Haffari,et al.  Active Learning for Statistical Phrase-based Machine Translation , 2009, NAACL.

[33]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.