Multi-Granularity Optimization for Non-Autoregressive Translation

Despite low latency, non-autoregressive machine translation (NAT) suffers severe performance deterioration due to the naive independence assumption. This assumption is further strengthened by cross-entropy loss, which en-courages a strict match between the hypothesis and the reference token by token. To alleviate this issue, we propose multi-granularity optimization for NAT, which collects model behav-iors on translation segments of various granularities and integrates feedback for backprop-agation. Experiments on four WMT bench-marks show that the proposed method significantly outperforms the baseline models trained with cross-entropy loss, and achieves the best performance on WMT’16 En ⇔ Ro and highly competitive results on WMT’14 En ⇔ De for fully non-autoregressive translation.

[1]  Hao Zhou,et al.  GLAT: Glancing at Latent Variables for Parallel Text Generation , 2022, ACL.

[2]  Lili Mou,et al.  Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision , 2021, AAAI.

[3]  Zhiting Hu,et al.  Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation , 2021, NAACL.

[4]  Zhaopeng Tu,et al.  Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation , 2021, ICML.

[5]  Jongyoon Song,et al.  AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate , 2021, EMNLP.

[6]  Julia Kreutzer,et al.  Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation , 2021, NAACL.

[7]  Dacheng Tao,et al.  Progressive Multi-Granularity Training for Non-Autoregressive Translation , 2021, FINDINGS.

[8]  Derek F. Wong,et al.  Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation , 2021, ACL.

[9]  Xinyu Dai,et al.  Non-Autoregressive Translation by Learning Target Categorical Codes , 2021, NAACL.

[10]  Weinan Zhang,et al.  Glancing Transformer for Non-Autoregressive Neural Machine Translation , 2020, ACL.

[11]  Tie-Yan Liu,et al.  Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation , 2020, IJCAI.

[12]  Yiming Yang,et al.  An EM Approach to Non-autoregressive Conditional Sequence Generation , 2020, ICML.

[13]  Mohammad Norouzi,et al.  Non-Autoregressive Machine Translation with Latent Alignments , 2020, EMNLP.

[14]  Omer Levy,et al.  Aligned Cross Entropy for Non-Autoregressive Machine Translation , 2020, ICML.

[15]  Enhong Chen,et al.  Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation , 2019, AAAI.

[16]  Graham Neubig,et al.  Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2019, ICLR.

[17]  Di He,et al.  Fast Structured Decoding for Sequence Models , 2019, NeurIPS.

[18]  Eduard Hovy,et al.  FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow , 2019, EMNLP.

[19]  Yang Feng,et al.  Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation , 2019, ACL.

[20]  Omer Levy,et al.  Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[21]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[22]  Graham Neubig,et al.  compare-mt: A Tool for Holistic Comparison of Language Generation Systems , 2019, NAACL.

[23]  T. Zhang,et al.  Neural Machine Translation with Adequacy-Oriented Learning , 2018, AAAI.

[24]  Jindrich Libovický,et al.  End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification , 2018, EMNLP.

[25]  Lijun Wu,et al.  A Study of Reinforcement Learning for Neural Machine Translation , 2018, EMNLP.

[26]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[27]  Aurko Roy,et al.  Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[28]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[31]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[32]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[33]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[34]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[37]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[38]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[39]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[40]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[41]  W. H. Carpenter,et al.  The Study of Language , 2019 .