Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation

Neural machine translation (NMT) typically adopts the encoder-decoder framework. A good understanding of the characteristics and functionalities of the encoder and decoder can help to explain the pros and cons of the framework, and design better models for NMT. In this work, we conduct an empirical study on the encoder and the decoder in NMT, taking Transformer as an example. We find that 1) the decoder handles an easier task than the encoder in NMT, 2) the decoder is more sensitive to the input noise than the encoder, and 3) the preceding words/tokens in the decoder provide strong conditional information, which accounts for the two observations above. We hope those observations can shed light on the characteristics of the encoder and decoder and inspire future research on NMT.

[1]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[2]  Christof Monz,et al.  What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[3]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[6]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[7]  Di He,et al.  Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input , 2018, AAAI.

[8]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[11]  Lukasz Kaiser,et al.  Depthwise Separable Convolutions for Neural Machine Translation , 2017, ICLR.

[12]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13]  Tobias Domhan,et al.  How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures , 2018, ACL.

[14]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[15]  Di He,et al.  Dense Information Flow for Neural Machine Translation , 2018, NAACL.

[16]  Joakim Nivre,et al.  An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Kaitao Song,et al.  Hybrid Self-Attention Network for Machine Translation , 2018, ArXiv.

[19]  Yang Liu,et al.  Visualizing and Understanding Neural Machine Translation , 2017, ACL.

[20]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[21]  Daniel Jurafsky,et al.  Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.

[22]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[23]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[24]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[25]  Di He,et al.  Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation , 2018, NeurIPS.

[26]  Yonatan Belinkov,et al.  Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[27]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[28]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[29]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[30]  Lijun Wu,et al.  Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter , 2018, EMNLP.

[31]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[32]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[33]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.