Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models

In this paper, we try to understand neural machine translation (NMT) via simplifying NMT architectures and training encoder-free NMT models. In an encoder-free model, the sums of word embeddings and positional embeddings represent the source. The decoder is a standard Transformer or recurrent neural network that directly attends to embeddings via attention mechanisms. Experimental results show (1) that the attention mechanism in encoder-free models acts as a strong feature extractor, (2) that the word embeddings in encoder-free models are competitive to those in conventional models, (3) that non-contextualized source representations lead to a big performance drop, and (4) that encoder-free models have different effects on alignment quality for German-English and Chinese-English.

[1]  Yonatan Belinkov,et al.  On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference , 2018, NAACL.

[2]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  Yang Liu,et al.  Contrastive Unsupervised Word Alignment with Non-Local Features , 2014, AAAI.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Matt Post,et al.  We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[7]  David Chiang,et al.  Improving Lexical Choice in Neural Machine Translation , 2017, NAACL.

[8]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Yonatan Belinkov,et al.  Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[12]  Joakim Nivre,et al.  An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.

[13]  Christof Monz,et al.  What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[14]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[15]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[16]  Hermann Ney,et al.  Neural Hidden Markov Model for Machine Translation , 2018, ACL.

[17]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[18]  Yang Liu,et al.  Visualizing and Understanding Neural Machine Translation , 2017, ACL.

[19]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[20]  John DeNero,et al.  Adding Interpretable Attention to Neural Translation Models Improves Word Alignment , 2019, ArXiv.

[21]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[22]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[23]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[24]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[25]  António Branco,et al.  Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings , 2017, ACL.

[26]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[27]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.