Faster Re-translation Using Non-Autoregressive Model For Simultaneous Neural Machine Translation

Recently, simultaneous translation has gathered a lot of attention since it enables compelling applications such as subtitle translation for a live event or real-time video-call translation. Some of these translation applications allow editing of partial translation giving rise to re-translation approaches. The current re-translation approaches are based on autoregressive sequence generation models (ReTA), which generate target tokens in the (partial) translation sequentially. The multiple re-translations with sequential generation in ReTA models lead to an increased inference time gap between the incoming source input and the corresponding target output as the source input grows. Besides, due to a large number of inference operations involved, the ReTA models are not favorable for resource-constrained devices. In this work, we propose a faster re-translation system based on a non-autoregressive sequence generation model (FReTNA) to overcome the aforementioned limitations. We evaluate the proposed model on multiple translation tasks and our model reduces the inference times by several orders and achieves a competitive BLEU score compared to the ReTA and streaming (Wait-k) models. The proposed model reduces the average computation time by a factor of 20 when compared to the ReTA model by incurring small drop in the translation quality. It also outperforms the streaming based Wait-k model both in terms of computation time (1.5 times lower) and translation quality.

[1]  Haifeng Wang,et al.  STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework , 2018, ACL.

[2]  Naveen Arivazhagan,et al.  Re-translation versus Streaming for Simultaneous Translation , 2020, IWSLT.

[3]  Qi Liu,et al.  Insertion-based Decoding with Automatically Inferred Generation Order , 2019, Transactions of the Association for Computational Linguistics.

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Graham Neubig,et al.  Learning to Translate in Real-time with Neural Machine Translation , 2016, EACL.

[6]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[7]  Naveen Arivazhagan,et al.  Re-Translation Strategies for Long Form, Simultaneous, Spoken Language Translation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jakob Uszkoreit,et al.  Blockwise Parallel Decoding for Deep Autoregressive Models , 2018, NeurIPS.

[9]  Matthias Sperber,et al.  Low-Latency Neural Speech Translation , 2018, INTERSPEECH.

[10]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[11]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[12]  Kyunghyun Cho,et al.  Non-Monotonic Sequential Text Generation , 2019, ICML.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Noah A. Smith,et al.  You May Not Need Attention , 2018, ArXiv.

[15]  Changhan Wang,et al.  Levenshtein Transformer , 2019, NeurIPS.

[16]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Wei Li,et al.  Monotonic Infinite Lookback Attention for Simultaneous Machine Translation , 2019, ACL.

[19]  Jason Lee,et al.  Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[20]  Omer Levy,et al.  Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[21]  Kyunghyun Cho,et al.  Can neural machine translation do simultaneous translation? , 2016, ArXiv.

[22]  Martti Vainio,et al.  Proceedings of the Annual Conference of the International Speech Communication Association , 2016, Interspeech 2016.

[23]  Jakob Uszkoreit,et al.  Insertion Transformer: Flexible Sequence Generation via Insertion Operations , 2019, ICML.

[24]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.