Explicit Reordering for Neural Machine Translation

In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency, which makes the Transformer-based NMT achieve state-of-the-art results for various translation tasks. However, Transformer-based NMT only adds representations of positions sequentially to word vectors in the input sentence and does not explicitly consider reordering information in this sentence. In this paper, we first empirically investigate the relationship between source reordering information and translation performance. The empirical findings show that the source input with the target order learned from the bilingual parallel dataset can substantially improve translation performance. Thus, we propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT. The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.

[1]  Maosong Sun,et al.  A Neural Reordering Model for Phrase-based Translation , 2014, COLING.

[2]  Masao Utiyama,et al.  Recurrent Positional Embedding for Neural Machine Translation , 2019, EMNLP/IJCNLP.

[3]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[4]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[7]  Akihiro Tamura,et al.  Distortion Model Considering Rich Context for Statistical Machine Translation , 2013, ACL.

[8]  Masaaki Nagata,et al.  A Clustered Global Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[9]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Masao Utiyama,et al.  Neural Machine Translation with Reordering Embeddings , 2019, ACL.

[12]  Chenhui Chu,et al.  Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation , 2018, ACL.

[13]  H. Ney,et al.  A Source-side Decoding Sequence Model for Statistical Machine Translation , 2010, AMTA.

[14]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[15]  Christopher D. Manning,et al.  Improved Models of Distortion Cost for Statistical Machine Translation , 2010, NAACL.

[16]  Hermann Ney,et al.  Advancements in Reordering Models for Statistical Machine Translation , 2013, ACL.

[17]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[18]  Yang Liu,et al.  Recursive Autoencoders for ITG-Based Translation , 2013, EMNLP.

[19]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[20]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[21]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[22]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[23]  Nadir Durrani,et al.  Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT? , 2013, ACL.

[24]  Masao Utiyama,et al.  Post-ordering by Parsing for Japanese-English Statistical Machine Translation , 2012, ACL.

[25]  Xing Wang,et al.  Modeling Recurrence for Transformer , 2019, NAACL.

[26]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[27]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[28]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[29]  Qun Liu,et al.  Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation , 2017, ACL.

[30]  Tiejun Zhao,et al.  Improving Neural Machine Translation with Neural Syntactic Distance , 2019, NAACL.

[31]  Toshiaki Nakazawa,et al.  ASPEC: Asian Scientific Paper Excerpt Corpus , 2016, LREC.

[32]  Yaser Al-Onaizan,et al.  Distortion Models for Statistical Machine Translation , 2006, ACL.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Zhaopeng Tu,et al.  Convolutional Self-Attention Networks , 2019, NAACL.

[35]  Xing Wang,et al.  Self-Attention with Structural Position Representations , 2019, EMNLP.