On the Relation between Position Information and Sentence Length in Neural Machine Translation

Long sentences have been one of the major challenges in neural machine translation (NMT). Although some approaches such as the attention mechanism have partially remedied the problem, we found that the current standard NMT model, Transformer, has difficulty in translating long sentences compared to the former standard, Recurrent Neural Network (RNN)-based model. One of the key differences of these NMT models is how the model handles position information which is essential to process sequential data. In this study, we focus on the position information type of NMT models, and hypothesize that relative position is better than absolute position. To examine the hypothesis, we propose RNN-Transformer which replaces positional encoding layer of Transformer by RNN, and then compare RNN-based model and four variants of Transformer. Experiments on ASPEC English-to-Japanese and WMT2014 English-to-German translation tasks demonstrate that relative position helps translating sentences longer than those in the training data. Further experiments on length-controlled training data reveal that absolute position actually causes overfitting to the sentence length.

[1]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[2]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[3]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[4]  Toshiaki Nakazawa,et al.  ASPEC: Asian Scientific Paper Excerpt Corpus , 2016, LREC.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Laurent Besacier,et al.  Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction , 2018, CoNLL.

[8]  Graham Neubig,et al.  Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis , 2011, ACL.

[9]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[10]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[11]  Deniz Yuret,et al.  Why Neural Translations are the Right Length , 2016, EMNLP.

[12]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13]  Yang Feng,et al.  Bridging the Gap between Training and Inference for Neural Machine Translation , 2019, ACL.

[14]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[15]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[16]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[19]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.

[22]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[23]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[24]  Yu Zhang,et al.  Simple Recurrent Units for Highly Parallelizable Recurrence , 2017, EMNLP.

[25]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[26]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.