论文信息 - A Convolutional Encoder Model for Neural Machine Translation - 字舞流文

A Convolutional Encoder Model for Neural Machine Translation

The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. We present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT’16 English-Romanian translation we achieve competitive accuracy to the state-of-the-art and on WMT’15 English-German we outperform several recently published results. Our models obtain almost the same accuracy as a very deep LSTM setup on WMT’14 English-French translation. We speed up CPU decoding by more than two times at the same or higher accuracy as a strong bi-directional LSTM.

Yann Dauphin | David Grangier | Michael Auli | Jonas Gehring | David Grangier | Yann Dauphin | Michael Auli | Jonas Gehring | Y. Dauphin

[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[2] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[4] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[5] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[6] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[7] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[8] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[10] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11] Marcello Federico,et al. Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.

[12] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[14] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[15] Yoshua Bengio,et al. Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[16] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[17] Andrew Lamb. Convolutional Encoders for Neural Machine Translation , 2015 .

[18] Zhengdong Lu,et al. Context-Dependent Translation Selection Using Convolutional Neural Network , 2015, ACL.

[19] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21] Qun Liu,et al. Encoding Source Language with Convolutional Neural Network for Machine Translation , 2015, ACL.

[22] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23] Rico Sennrich,et al. Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[24] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[27] Marcin Junczys-Dowmunt,et al. Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.

[28] Richard Socher,et al. MetaMind Neural Machine Translation System for WMT 2016 , 2016, WMT.

[29] David Grangier,et al. Vocabulary Selection Strategies for Neural Machine Translation , 2016, ArXiv.

[30] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[31] Alex Graves,et al. Neural Machine Translation in Linear Time , 2016, ArXiv.

[32] Yoshua Bengio,et al. A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[33] Gemma Boleda,et al. Convolutional Neural Network Language Models , 2016, EMNLP.

[34] Wei Xu,et al. Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.

[35] Zhiguo Wang,et al. Vocabulary Manipulation for Neural Machine Translation , 2016, ACL.

[36] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[37] Alexander J. Smola,et al. Neural Machine Translation with Recurrent Attention Modeling , 2016, EACL.