Self-Attention with Cross-Lingual Position Representation

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. However, in cross-lingual scenarios, e.g. machine translation, the PEs of source and target sentences are modeled independently. Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem. In this paper, we augment SANs with \emph{cross-lingual position representations} to model the bilingually aware latent structure for the input sentence. Specifically, we utilize bracketing transduction grammar (BTG)-based reordering information to encourage SANs to learn bilingual diagonal alignments. Experimental results on WMT'14 English$\Rightarrow$German, WAT'17 Japanese$\Rightarrow$English, and WMT'17 Chinese$\Leftrightarrow$English translation tasks demonstrate that our approach significantly and consistently improves translation quality over strong baselines. Extensive analyses confirm that the performance gains come from the cross-lingual information.

[1]  Chenhui Chu,et al.  Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation , 2018, ACL.

[2]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[3]  Xing Wang,et al.  Modeling Recurrence for Transformer , 2019, NAACL.

[4]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[5]  Taro Watanabe,et al.  Inducing a Discriminative Parser to Optimize Machine Translation Reordering , 2012, EMNLP.

[6]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[7]  Masao Utiyama,et al.  Recurrent Positional Embedding for Neural Machine Translation , 2019, EMNLP/IJCNLP.

[8]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[9]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[10]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Andy Way,et al.  Pre-Reordering for Neural Machine Translation: Helpful or Harmful? , 2017, Prague Bull. Math. Linguistics.

[13]  Masaaki Nagata,et al.  NTT Neural Machine Translation Systems at WAT 2017 , 2019, WAT@IJCNLP.

[14]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[15]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[16]  Stephan Peitz,et al.  Jointly Learning to Align and Translate with Transformer Models , 2019, EMNLP.

[17]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[18]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[19]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[20]  Masao Utiyama,et al.  Neural Machine Translation with Reordering Embeddings , 2019, ACL.

[21]  Nadir Durrani,et al.  Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT? , 2013, ACL.

[22]  Masaaki Nagata,et al.  A Clustered Global Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[23]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[24]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[25]  M. Gell-Mann,et al.  The origin and evolution of word order , 2011, Proceedings of the National Academy of Sciences.

[26]  Jiajun Zhang,et al.  Exploiting Pre-Ordering for Neural Machine Translation , 2018, LREC.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Joakim Nivre,et al.  Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models , 2019, RANLP.

[29]  Zhaopeng Tu,et al.  Assessing the Ability of Self-Attention Networks to Learn Word Order , 2019, ACL.

[30]  Xing Wang,et al.  Self-Attention with Structural Position Representations , 2019, EMNLP.

[31]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.