Encoders Help You Disambiguate Word Senses in Neural Machine Translation

Neural machine translation (NMT) has achieved new state-of-the-art performance in translating ambiguous words. However, it is still unclear which component dominates the process of disambiguation. In this paper, we explore the ability of NMT encoders and decoders to disambiguate word senses by evaluating hidden states and investigating the distributions of self-attention. We train a classifier to predict whether a translation is correct given the representation of an ambiguous noun. We find that encoder hidden states outperform word embeddings significantly which indicates that encoders adequately encode relevant information for disambiguation into hidden states. In contrast to encoders, the effect of decoder is different in models with different architectures. Moreover, the attention weights and attention entropy show that self-attention can detect ambiguous nouns and distribute more attention to the context.

[1]  Joakim Nivre,et al.  An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.

[2]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[3]  Matt Post,et al.  We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[4]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[6]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[7]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[8]  Yonatan Belinkov,et al.  Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[9]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[10]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[11]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[12]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Philipp Koehn,et al.  Exploring Word Sense Disambiguation Abilities of Neural Machine Translation Systems (Non-archival Extended Abstract) , 2018, AMTA.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Laura Mascarell,et al.  Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings , 2017, WMT.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Rico Sennrich,et al.  The Word Sense Disambiguation Test Suite at WMT18 , 2018, WMT.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[21]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[22]  Christof Monz,et al.  What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[23]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[24]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[25]  Joakim Nivre,et al.  Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models , 2019, RANLP.

[26]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.