论文信息 - Convolutional Sequence to Sequence Learning

Convolutional Sequence to Sequence Learning

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.

[1] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[3] Michael A. Arbib,et al. The handbook of brain theory and neural networks , 1995, A Bradford book.

[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[5] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[6] Treebank Penn,et al. Linguistic Data Consortium , 1999 .

[7] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8] Paul Over,et al. DUC in context , 2007, Inf. Process. Manag..

[9] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[11] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[13] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[14] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[15] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18] Yoshua Bengio,et al. Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[19] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[20] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[21] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[22] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[23] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.