End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification

Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in parallel. We present a novel non-autoregressive architecture based on connectionist temporal classification and evaluate it on the task of neural machine translation. Unlike other non-autoregressive methods which operate in several steps, our model can be trained end-to-end. We conduct experiments on the WMT English-Romanian and English-German datasets. Our models achieve a significant speedup over the autoregressive models, keeping the translation quality comparable to other non-autoregressive models.

[1]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[2]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[3]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[4]  Samy Bengio,et al.  Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[5]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[6]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[7]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[8]  Jason Lee,et al.  Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[9]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[10]  Colin Cherry,et al.  A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU , 2014, WMT@ACL.

[11]  Jindrich Libovický,et al.  Neural Monkey: An Open-source Tool for Sequence Learning , 2017, Prague Bull. Math. Linguistics.

[12]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[15]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[16]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Matt Post,et al.  We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.