论文信息 - Multi-Granularity Optimization for Non-Autoregressive Translation - 字舞流文

Multi-Granularity Optimization for Non-Autoregressive Translation

Despite low latency, non-autoregressive machine translation (NAT) suffers severe performance deterioration due to the naive independence assumption. This assumption is further strengthened by cross-entropy loss, which en-courages a strict match between the hypothesis and the reference token by token. To alleviate this issue, we propose multi-granularity optimization for NAT, which collects model behav-iors on translation segments of various granularities and integrates feedback for backprop-agation. Experiments on four WMT bench-marks show that the proposed method signiﬁcantly outperforms the baseline models trained with cross-entropy loss, and achieves the best performance on WMT’16 En ⇔ Ro and highly competitive results on WMT’14 En ⇔ De for fully non-autoregressive translation.

Yue Zhang | Leyang Cui | Yafu Li | Yongjing Yin

[1] Hao Zhou,et al. GLAT: Glancing at Latent Variables for Parallel Text Generation , 2022, ACL.

[2] Lili Mou,et al. Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision , 2021, AAAI.

[3] Zhiting Hu,et al. Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation , 2021, NAACL.

[4] Zhaopeng Tu,et al. Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation , 2021, ICML.

[5] Jongyoon Song,et al. AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate , 2021, EMNLP.

[6] Julia Kreutzer,et al. Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation , 2021, NAACL.

[7] Dacheng Tao,et al. Progressive Multi-Granularity Training for Non-Autoregressive Translation , 2021, FINDINGS.

[8] Derek F. Wong,et al. Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation , 2021, ACL.

[9] Xinyu Dai,et al. Non-Autoregressive Translation by Learning Target Categorical Codes , 2021, NAACL.

[10] Weinan Zhang,et al. Glancing Transformer for Non-Autoregressive Neural Machine Translation , 2020, ACL.

[11] Tie-Yan Liu,et al. Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation , 2020, IJCAI.

[12] Yiming Yang,et al. An EM Approach to Non-autoregressive Conditional Sequence Generation , 2020, ICML.

[13] Mohammad Norouzi,et al. Non-Autoregressive Machine Translation with Latent Alignments , 2020, EMNLP.

[14] Omer Levy,et al. Aligned Cross Entropy for Non-Autoregressive Machine Translation , 2020, ICML.

[15] Enhong Chen,et al. Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation , 2019, AAAI.

[16] Graham Neubig,et al. Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2019, ICLR.

[17] Di He,et al. Fast Structured Decoding for Sequence Models , 2019, NeurIPS.

[18] Eduard Hovy,et al. FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow , 2019, EMNLP.

[19] Yang Feng,et al. Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation , 2019, ACL.

[20] Omer Levy,et al. Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[21] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[22] Graham Neubig,et al. compare-mt: A Tool for Holistic Comparison of Language Generation Systems , 2019, NAACL.

[23] T. Zhang,et al. Neural Machine Translation with Adequacy-Oriented Learning , 2018, AAAI.

[24] Jindrich Libovický,et al. End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.

[25] Lijun Wu,et al. A Study of Reinforcement Learning for Neural Machine Translation , 2018, EMNLP.

[26] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.

[27] Aurko Roy,et al. Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[28] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[30] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[31] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[32] Yang Liu,et al. Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[33] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[34] Maja Popovic,et al. chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[37] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[38] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[39] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[40] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[41] W. H. Carpenter,et al. The Study of Language , 2019 .