论文信息 - Incorporating Source Syntax into Transformer-Based Neural Machine Translation - 字舞流文

Incorporating Source Syntax into Transformer-Based Neural Machine Translation

Transformer-based neural machine translation (NMT) has recently achieved state-ofthe-art performance on many machine translation tasks. However, recent work (Raganato and Tiedemann, 2018; Tang et al., 2018; Tran et al., 2018) has indicated that Transformer models may not learn syntactic structures as well as their recurrent neural network-based counterparts, particularly in low-resource cases. In this paper, we incorporate constituency parse information into a Transformer NMT model. We leverage linearized parses of the source training sentences in order to inject syntax into the Transformer architecture without modifying it. We introduce two methods: a multi-task machine translation and parsing model with a single encoder and decoder, and a mixed encoder model that learns to translate directly from parsed and unparsed source sentences. We evaluate our methods on low-resource translation from English into twenty target languages, showing consistent improvements of 1.3 BLEU on average across diverse target languages for the multi-task technique. We further evaluate the models on full-scale WMT tasks, finding that the multi-task model aids lowand medium-resource NMT but degenerates high-resource English→German translation.

Kenneth Heafield | Anna Currey | Kenneth Heafield | Anna Currey

[1] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[2] Quoc V. Le,et al. Multi-task Sequence to Sequence Learning , 2015, ICLR.

[3] Matt Post,et al. We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .

[4] Guodong Zhou,et al. Modeling Source Syntax for Neural Machine Translation , 2017, ACL.

[5] Philipp Koehn,et al. Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[6] Meishan Zhang,et al. Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations , 2019, NAACL.

[7] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[8] Rico Sennrich,et al. Predicting Target Language CCG Supertags Improves Neural Machine Translation , 2017, WMT.

[9] Rico Sennrich,et al. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[10] Khalil Sima'an,et al. Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[11] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[13] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15] Christof Monz,et al. The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.

[16] Philipp Koehn,et al. Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[17] Eliyahu Kiperwasser,et al. Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[18] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[19] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[20] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[22] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[23] Yoav Goldberg,et al. Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[24] Ryan Cotterell,et al. Are All Languages Equally Hard to Language-Model? , 2018, NAACL.

[25] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[26] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[27] Nan Yang,et al. Dependency-to-Dependency Neural Machine Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28] Yoshimasa Tsuruoka,et al. Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[29] Eugene Charniak,et al. Parsing as Language Modeling , 2016, EMNLP.

[30] Kenneth Heafield,et al. Multi-Source Syntactic Neural Machine Translation , 2018, EMNLP.