Recurrent Graph Syntax Encoder for Neural Machine Translation

Syntax-incorporated machine translation models have been proven successful in improving the model's reasoning and meaning preservation ability. In this paper, we propose a simple yet effective graph-structured encoder, the Recurrent Graph Syntax Encoder, dubbed \textbf{RGSE}, which enhances the ability to capture useful syntactic information. The RGSE is done over a standard encoder (recurrent or self-attention encoder), regarding recurrent network units as graph nodes and injects syntactic dependencies as edges, such that RGSE models syntactic dependencies and sequential information (\textit{i.e.}, word order) simultaneously. Our approach achieves considerable improvements over several syntax-aware NMT models in English$\Rightarrow$German and English$\Rightarrow$Czech translation tasks. And RGSE-equipped big model obtains competitive result compared with the state-of-the-art model in WMT14 En-De task. Extensive analysis further verifies that RGSE could benefit long sentence modeling, and produces better translations.

[1]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[2]  Gholamreza Haffari,et al.  Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model , 2017, COLING.

[3]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[4]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[5]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[6]  Nan Yang,et al.  Dependency-to-Dependency Neural Machine Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Guodong Zhou,et al.  Modeling Source Syntax for Neural Machine Translation , 2017, ACL.

[8]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[9]  Bill Byrne,et al.  Syntactically Guided Neural Machine Translation , 2016, ACL.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[12]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[13]  John Hale,et al.  LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.

[14]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[17]  Meishan Zhang,et al.  Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations , 2019, NAACL.

[18]  Ming Zhou,et al.  Improved Neural Machine Translation with Source Syntax , 2017, IJCAI.

[19]  Richard Socher,et al.  Weighted Transformer Network for Machine Translation , 2017, ArXiv.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Jörg Tiedemann,et al.  An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[23]  Yue Zhang,et al.  N-ary Relation Extraction using Graph-State LSTM , 2018, EMNLP.

[24]  Tong Zhang,et al.  Modeling Localness for Self-Attention Networks , 2018, EMNLP.

[25]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[26]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[27]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[28]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[29]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[30]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[32]  Tiejun Zhao,et al.  Forest-Based Neural Machine Translation , 2018, ACL.

[33]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[34]  Yue Zhang,et al.  Semantic Neural Machine Translation Using AMR , 2019, TACL.

[35]  Yue Zhang,et al.  A Graph-to-Sequence Model for AMR-to-Text Generation , 2018, ACL.

[36]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[37]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[38]  Shujian Huang,et al.  Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder , 2017, ACL.

[39]  Gholamreza Haffari,et al.  Graph-to-Sequence Learning using Gated Graph Neural Networks , 2018, ACL.

[40]  David Chiang,et al.  Tied Multitask Learning for Neural Speech Translation , 2018, NAACL.

[41]  Yue Zhang,et al.  Sentence-State LSTM for Text Representation , 2018, ACL.

[42]  Tobias Domhan,et al.  How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures , 2018, ACL.

[43]  Diego Marcheggiani,et al.  Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks , 2018, NAACL.

[44]  Jan Niehues,et al.  Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning , 2017, WMT.

[45]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[46]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[47]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.