论文信息 - Self-Attention with Cross-Lingual Position Representation

Self-Attention with Cross-Lingual Position Representation

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. However, in cross-lingual scenarios, e.g. machine translation, the PEs of source and target sentences are modeled independently. Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem. In this paper, we augment SANs with \emph{cross-lingual position representations} to model the bilingually aware latent structure for the input sentence. Specifically, we utilize bracketing transduction grammar (BTG)-based reordering information to encourage SANs to learn bilingual diagonal alignments. Experimental results on WMT'14 English$\Rightarrow$German, WAT'17 Japanese$\Rightarrow$English, and WMT'17 Chinese$\Leftrightarrow$English translation tasks demonstrate that our approach significantly and consistently improves translation quality over strong baselines. Extensive analyses confirm that the performance gains come from the cross-lingual information.

Dacheng Tao | Longyue Wang | Liang Ding

[1] Chenhui Chu,et al. Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation , 2018, ACL.

[2] Nadir Durrani,et al. A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[3] Xing Wang,et al. Modeling Recurrence for Transformer , 2019, NAACL.

[4] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[5] Taro Watanabe,et al. Inducing a Discriminative Parser to Optimize Machine Translation Reordering , 2012, EMNLP.

[6] Gholamreza Haffari,et al. Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[7] Masao Utiyama,et al. Recurrent Positional Embedding for Neural Machine Translation , 2019, EMNLP/IJCNLP.

[8] Tao Shen,et al. DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[9] Dekai Wu,et al. Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[10] Lijun Wu,et al. Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.