Self-Training Sampling with Monolingual Data Uncertainty for Neural Machine Translation

Self-training has proven effective for improving NMT performance by augmenting model training with synthetic parallel data. The common practice is to construct synthetic data based on a randomly sampled subset of large-scale monolingual data, which we empirically show is sub-optimal. In this work, we propose to improve the sampling procedure by selecting the most informative monolingual sentences to complement the parallel data. To this end, we compute the uncertainty of monolingual sentences using the bilingual dictionary extracted from the parallel data. Intuitively, monolingual sentences with lower uncertainty generally correspond to easy-to-translate patterns which may not provide additional gains. Accordingly, we design an uncertainty-based sampling strategy to efficiently exploit the monolingual data for self-training, in which monolingual sentences with higher uncertainty would be sampled with higher probability. Experimental results on large-scale WMT English⇒German and English⇒Chinese datasets demonstrate the effectiveness of the proposed approach. Extensive analyses suggest that emphasizing the learning on uncertain monolingual sentences by our approach does improve the translation quality of high-uncertainty sentences and also benefits the prediction of low-frequency words at the target side.1

[1]  Lemao Liu,et al.  TranSmart: A Practical Interactive Machine Translation System , 2021, ArXiv.

[2]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[3]  Subhabrata Mukherjee,et al.  Uncertainty-aware Self-training for Few-shot Text Classification , 2020, NeurIPS.

[4]  Xiaopu Li,et al.  OPPO's Machine Translation Systems for WMT20 , 2020, WMT@EMNLP.

[5]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[6]  Andrew McCallum,et al.  Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples , 2017, NIPS.

[7]  Jiajun Shen,et al.  Revisiting Self-Training for Neural Sequence Generation , 2020, ICLR.

[8]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[9]  Michael R. Lyu,et al.  Exploiting Unsupervised Data for Emotion Recognition in Conversations , 2020, FINDINGS.

[10]  Jie Zhou,et al.  Token-level Adaptive Training for Neural Machine Translation , 2020, EMNLP.

[11]  Benjamin Marie,et al.  Tagged Back-translation Revisited: Why Does It Really Work? , 2020, ACL.

[12]  Enhong Chen,et al.  Joint Training for Neural Machine Translation Models with Monolingual Data , 2018, AAAI.

[13]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[14]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[15]  Ciprian Chelba,et al.  Tagged Back-Translation , 2019, WMT.

[16]  Marc'Aurelio Ranzato,et al.  Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[17]  Lemao Liu,et al.  Neural Machine Translation with Monolingual Translation Memory , 2021, ACL.

[18]  Graham Neubig,et al.  compare-mt: A Tool for Holistic Comparison of Language Generation Systems , 2019, NAACL.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  Hermann Ney,et al.  Automatic Filtering of Bilingual Corpora for Statistical Machine Translation , 2005, NLDB.

[21]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[22]  Barnabás Póczos,et al.  Competence-based Curriculum Learning for Neural Machine Translation , 2019, NAACL.

[23]  Ankur Bapna,et al.  Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation , 2020, ACL.

[24]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[25]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[28]  Lidia S. Chao,et al.  Uncertainty-Aware Curriculum Learning for Neural Machine Translation , 2020, ACL.

[29]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[30]  Daniel Jurafsky,et al.  A Simple, Fast Diverse Decoding Algorithm for Neural Generation , 2016, ArXiv.

[31]  Kyunghyun Cho,et al.  Generating Diverse Translations with Sentence Codes , 2019, ACL.

[32]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[34]  Philipp Koehn,et al.  Translationese in Machine Translation Evaluation , 2019, EMNLP.

[35]  Shuming Shi,et al.  On the Inference Calibration of Neural Machine Translation , 2020, ACL.

[36]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[37]  Shilin He,et al.  Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation , 2020, EMNLP.

[38]  Huanbo Luan,et al.  Improving Back-Translation with Uncertainty-based Confidence Estimation , 2019, EMNLP.

[39]  Graham Neubig,et al.  Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2020, ICLR.

[40]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[41]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[42]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[43]  Tie-Yan Liu,et al.  Exploiting Monolingual Data at Scale for Neural Machine Translation , 2019, EMNLP.

[44]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[45]  Shuming Shi,et al.  Tencent Neural Machine Translation Systems for the WMT20 News Translation Task , 2020, WMT.

[46]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.