Incorporating Source Syntax into Transformer-Based Neural Machine Translation

Transformer-based neural machine translation (NMT) has recently achieved state-ofthe-art performance on many machine translation tasks. However, recent work (Raganato and Tiedemann, 2018; Tang et al., 2018; Tran et al., 2018) has indicated that Transformer models may not learn syntactic structures as well as their recurrent neural network-based counterparts, particularly in low-resource cases. In this paper, we incorporate constituency parse information into a Transformer NMT model. We leverage linearized parses of the source training sentences in order to inject syntax into the Transformer architecture without modifying it. We introduce two methods: a multi-task machine translation and parsing model with a single encoder and decoder, and a mixed encoder model that learns to translate directly from parsed and unparsed source sentences. We evaluate our methods on low-resource translation from English into twenty target languages, showing consistent improvements of 1.3 BLEU on average across diverse target languages for the multi-task technique. We further evaluate the models on full-scale WMT tasks, finding that the multi-task model aids lowand medium-resource NMT but degenerates high-resource English→German translation.

[1]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[2]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[3]  Matt Post,et al.  We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[4]  Guodong Zhou,et al.  Modeling Source Syntax for Neural Machine Translation , 2017, ACL.

[5]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[6]  Meishan Zhang,et al.  Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations , 2019, NAACL.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Rico Sennrich,et al.  Predicting Target Language CCG Supertags Improves Neural Machine Translation , 2017, WMT.

[9]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[10]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Christof Monz,et al.  The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.

[16]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[17]  Eliyahu Kiperwasser,et al.  Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[18]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[19]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[22]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[23]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[24]  Ryan Cotterell,et al.  Are All Languages Equally Hard to Language-Model? , 2018, NAACL.

[25]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[26]  Jörg Tiedemann,et al.  An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[27]  Nan Yang,et al.  Dependency-to-Dependency Neural Machine Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[29]  Eugene Charniak,et al.  Parsing as Language Modeling , 2016, EMNLP.

[30]  Kenneth Heafield,et al.  Multi-Source Syntactic Neural Machine Translation , 2018, EMNLP.