Frequency-Aware Contrastive Learning for Neural Machine Translation

Low-frequency word prediction remains a challenge in modern neural machine translation (NMT) systems. Recent adaptive training methods promote the output of infrequent words by emphasizing their weights in the overall training objectives. Despite the improved recall of low-frequency words, their prediction precision is unexpectedly hindered by the adaptive objectives. Inspired by the observation that lowfrequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective. Specifically, we propose a frequency-aware tokenlevel contrastive learning method, in which the hidden state of each decoding step is pushed away from the counterparts of other target words, in a soft contrastive way based on the corresponding word frequencies. We conduct experiments on widely used NIST Chinese-English and WMT14 EnglishGerman translation tasks. Empirical results show that our proposed methods can not only significantly improve the translation quality but also enhance lexical diversity and optimize word representation space. Further investigation reveals that, comparing with related adaptive training strategies, the superiority of our method on low-frequency word prediction lies in the robustness of token-level recall across different frequencies without sacrificing precision.

[1]  Fandong Meng,et al.  Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation , 2021, ACL.

[2]  Jonathan May,et al.  Finding the Optimal Vocabulary Size for Neural Machine Translation , 2020, FINDINGS.

[3]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[5]  Di He,et al.  Representation Degeneration Problem in Training Natural Language Generation Models , 2019, ICLR.

[6]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[7]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[8]  Shikun Zhang,et al.  Point, Disambiguate and Copy: Incorporating Bilingual Dictionaries for Neural Machine Translation , 2021, ACL.

[9]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[10]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[11]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[12]  Pengtao Xie,et al.  CERT: Contrastive Self-supervised Learning for Language Understanding , 2020, ArXiv.

[13]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[14]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[15]  Philip M. McCarthy,et al.  MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment , 2010, Behavior research methods.

[16]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[17]  Lidia S. Chao,et al.  Uncertainty-Aware Curriculum Learning for Neural Machine Translation , 2020, ACL.

[18]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[19]  Mingxuan Wang,et al.  Contrastive Learning for Many-to-many Multilingual Neural Machine Translation , 2021, ACL.

[20]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[21]  Yang Liu,et al.  THUMT: An Open-Source Toolkit for Neural Machine Translation , 2017, AMTA.

[22]  Tianyu Gao,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[23]  Jinsong Su,et al.  Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation , 2021, ACL.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Jinsong Su,et al.  Towards User-Driven Neural Machine Translation , 2021, ACL.

[26]  M. de Rijke,et al.  Improving Neural Response Diversity with Frequency-Aware Cross-Entropy Loss , 2019, WWW.

[27]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[28]  Michael A. Covington,et al.  Cutting the Gordian Knot: The Moving-Average Type–Token Ratio (MATTR) , 2010, J. Quant. Linguistics.

[29]  Phillip Isola,et al.  Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[30]  Andy Way,et al.  Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation , 2019, MTSummit.

[31]  Shuhao Gu,et al.  Modeling Fluency and Faithfulness for Diverse Neural Machine Translation , 2019, AAAI.

[32]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[33]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[34]  Maosong Sun,et al.  Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach , 2019, ACL.

[35]  Jing Huang,et al.  Improving Neural Language Generation with Spectrum Control , 2020, ICLR.

[36]  Haibo Zhang,et al.  Self-Paced Learning for Neural Machine Translation , 2020, EMNLP.

[37]  Xiaoyu Lv,et al.  Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval , 2021, ArXiv.

[38]  Jie Zhou,et al.  Token-level Adaptive Training for Neural Machine Translation , 2020, EMNLP.

[39]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[40]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[41]  Sung Ju Hwang,et al.  Contrastive Learning with Adversarial Perturbations for Conditional Text Generation , 2021, ICLR.