Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

In this paper, we propose a novel neural network model called RNN Encoder‐ Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder‐Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[3]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[4]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[5]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[6]  Marta R. Costa-jussà,et al.  Continuous space language models for the IWSLT 2006 task , 2006, IWSLT.

[7]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[8]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[9]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[10]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[11]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[12]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[13]  Alexandre Allauzen,et al.  Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[14]  Holger Schwenk,et al.  Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[15]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[17]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[20]  Geoffrey Zweig,et al.  Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.

[21]  Laurens van der Maaten,et al.  Barnes-Hut-SNE , 2013, ICLR.

[22]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[25]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[26]  Jianfeng Gao,et al.  Learning Semantic Representations for the Phrase Translation Model , 2013, ArXiv.

[27]  Razvan Pascanu,et al.  Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[29]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[30]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[31]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.