Feedforward sequential memory networks based encoder-decoder model for machine translation

Recently recurrent neural networks based encoder-decoder model is a popular approach to sequence to sequence mapping problems, such as machine translation. However, it is time-consuming to train the model since symbols in a sequence can not be processed parallelly by recurrent neural networks because of the temporal dependency restriction. In this paper we present a sequence to sequence model by replacing the recurrent neural networks with feedforward sequential memory networks in both encoder and decoder, which enables the new architecture to encode the entire source sentence simultaneously. We also modify the attention module to make the decoder generate outputs simultaneously during training. We achieve comparable results in WMT'14 English-to-French translation task with 1.4 to 2 times faster during training because of temporal independency in feedforward sequential memory networks based encoder and decoder.

[1]  Gemma Boleda,et al.  Convolutional Neural Network Language Models , 2016, EMNLP.

[2]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[3]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[4]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shiliang Zhang,et al.  Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition , 2016, INTERSPEECH.

[6]  Yu Hu,et al.  Nonrecurrent Neural Structure for Long-Term Dependence , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[8]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Qun Liu,et al.  Encoding Source Language with Convolutional Neural Network for Machine Translation , 2015, ACL.

[13]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[14]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[15]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Yu Hu,et al.  Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency , 2015, ArXiv.

[18]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[19]  Yann Dauphin,et al.  A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.