论文信息 - Using Perturbed Length-aware Positional Encoding for Non-autoregressive Neural Machine Translation

Using Perturbed Length-aware Positional Encoding for Non-autoregressive Neural Machine Translation

Non-autoregressive neural machine translation (NAT) usually employs sequence-level knowledge distillation using autoregressive neural machine translation (AT) as its teacher model. However, a NAT model often outputs shorter sentences than an AT model. In this work, we propose sequence-level knowledge distillation (SKD) using perturbed length-aware positional encoding and apply it to a student model, the Levenshtein Transformer. Our method outperformed a standard Levenshtein Transformer by 2.5 points in bilingual evaluation understudy (BLEU) at maximum in a WMT14 German to English translation. The NAT model output longer sentences than the baseline NAT models.

[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[3] Toshiaki Nakazawa,et al. ASPEC: Asian Scientific Paper Excerpt Corpus , 2016, LREC.

[4] Graham Neubig,et al. Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2020, ICLR.

[5] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[6] Changhan Wang,et al. Levenshtein Transformer , 2019, NeurIPS.

[7] Jiajun Zhang,et al. Addressing the Under-Translation Problem from the Entropy Perspective , 2019, AAAI.

[8] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[9] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[10] Katsuhito Sudoh,et al. Incorporating Noisy Length Constraints into Transformer with Length-aware Positional Encodings , 2020, COLING.

[11] Taku Kudo,et al. MeCab : Yet Another Part-of-Speech and Morphological Analyzer , 2005 .

[12] Satoshi Nakamura,et al. Length-constrained Neural Machine Translation using Length Prediction and Perturbation into Length-aware Positional Encoding , 2021, Journal of Natural Language Processing.

[13] Philipp Koehn,et al. Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[14] Naoaki Okazaki,et al. Positional Encoding to Control Output Sequence Length , 2019, NAACL.

[15] Marcello Federico,et al. Controlling the Output Length of Neural Machine Translation , 2019, IWSLT.

[16] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17] Omer Levy,et al. Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.