论文信息 - Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models - 字舞流文

Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models

In this paper, we try to understand neural machine translation (NMT) via simplifying NMT architectures and training encoder-free NMT models. In an encoder-free model, the sums of word embeddings and positional embeddings represent the source. The decoder is a standard Transformer or recurrent neural network that directly attends to embeddings via attention mechanisms. Experimental results show (1) that the attention mechanism in encoder-free models acts as a strong feature extractor, (2) that the word embeddings in encoder-free models are competitive to those in conventional models, (3) that non-contextualized source representations lead to a big performance drop, and (4) that encoder-free models have different effects on alignment quality for German-English and Chinese-English.

Joakim Nivre | Rico Sennrich | Gongbo Tang | Rico Sennrich | Joakim Nivre | Gongbo Tang

[1] Yonatan Belinkov,et al. On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference , 2018, NAACL.

[2] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[3] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4] Yang Liu,et al. Contrastive Unsupervised Word Alignment with Non-Local Features , 2014, AAAI.

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Matt Post,et al. We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[7] David Chiang,et al. Improving Lexical Choice in Neural Machine Translation , 2017, NAACL.

[8] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[9] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[11] Yonatan Belinkov,et al. Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[12] Joakim Nivre,et al. An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.

[13] Christof Monz,et al. What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[14] Philipp Koehn,et al. Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[15] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[16] Hermann Ney,et al. Neural Hidden Markov Model for Machine Translation , 2018, ACL.

[17] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[18] Yang Liu,et al. Visualizing and Understanding Neural Machine Translation , 2017, ACL.

[19] Rico Sennrich,et al. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[20] John DeNero,et al. Adding Interpretable Attention to Neural Translation Models Improves Word Alignment , 2019, ArXiv.

[21] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[22] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[23] Philipp Koehn,et al. Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[24] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[25] António Branco,et al. Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings , 2017, ACL.

[26] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[27] Philipp Koehn,et al. Six Challenges for Neural Machine Translation , 2017, NMT@ACL.