On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

This work investigates the alignment problem in state-of-the-art multi-head attention models based on the transformer architecture. We demonstrate that alignment extraction in transformer models can be improved by augmenting an additional alignment head to the multi-head source-to-target attention component. This is used to compute sharper attention weights. We describe how to use the alignment head to achieve competitive performance. To study the effect of adding the alignment head, we simulate a dictionary-guided translation task, where the user wants to guide translation using pre-defined dictionary entries. Using the proposed approach, we achieve up to $3.8$ % BLEU improvement when using the dictionary, in comparison to $2.4$ % BLEU in the baseline case. We also propose alignment pruning to speed up decoding in alignment-based neural machine translation (ANMT), which speeds up translation by a factor of $1.8$ without loss in translation performance. We carry out experiments on the shared WMT 2016 English$\to$Romanian news task and the BOLT Chinese$\to$English discussion forum task.

[1]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[2]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[3]  Bo Wang,et al.  SYSTRAN's Pure Neural Machine Translation Systems , 2016, ArXiv.

[4]  Taro Watanabe,et al.  Recurrent Neural Networks for Word Alignment Model , 2014, ACL.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[7]  Gonzalo Iglesias,et al.  Neural Machine Translation Decoding with Terminology Constraints , 2018, NAACL.

[8]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[9]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[10]  Hermann Ney,et al.  Alignment-Based Neural Machine Translation , 2016, WMT.

[11]  Lei Yu,et al.  Online Segment to Segment Neural Transduction , 2016, EMNLP.

[12]  Hermann Ney,et al.  Translation Modeling with Bidirectional Recurrent Neural Networks , 2014, EMNLP.

[13]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[14]  Hermann Ney,et al.  Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information , 2017, WMT.

[15]  Nenghai Yu,et al.  Word Alignment Modeling with Context Dependent Deep Neural Network , 2013, ACL.

[16]  Lucia Specia,et al.  Guiding Neural Machine Translation Decoding with External Knowledge , 2017, WMT.

[17]  Matt Post,et al.  We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[18]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  Lei Yu,et al.  The Neural Noisy Channel , 2016, ICLR.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[23]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[24]  Hermann Ney,et al.  Neural Hidden Markov Model for Machine Translation , 2018, ACL.

[25]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[26]  Wenhu Chen,et al.  Guided Alignment Training for Topic-Aware Neural Machine Translation , 2016, AMTA.

[27]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.