Towards More Diverse Input Representation for Neural Machine Translation

Source input information plays a very important role in the Transformer-based translation system. In practice, word embedding and positional embedding of each word are added as the input representation. Then self-attention networks are used to encode the global dependencies in the input representation to generate a source representation. However, this processing on the source representation only adopts a single source feature and excludes richer and more diverse features such as recurrence features, local features, and syntactic features, which results in tedious representation and thereby hinders the further translation performance improvement. In this paper, we introduce a simple and efficient method to encode more diverse source features into the input representation simultaneously, and thereby learning an effective source representation by self-attention networks. In particular, the proposed grouped strategy is only applied to the input representation layer, to keep the diversity of translation information and the efficiency of the self-attention networks at the same time. Experimental results show that our approach improves the translation performance over the state-of-the-art baselines of Transformer in regard to WMT14 English-to-German and NIST Chinese-to-English machine translation tasks.

[1]  Kenneth Heafield,et al.  Multi-Source Syntactic Neural Machine Translation , 2018, EMNLP.

[2]  Qun Liu,et al.  A novel dependency-to-string model for statistical machine translation , 2011, EMNLP.

[3]  Di He,et al.  Double Path Networks for Sequence to Sequence Learning , 2018, COLING.

[4]  Yang Liu,et al.  A Hierarchy-to-Sequence Attentional Neural Machine Translation Model , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Ming Zhou,et al.  Improved Neural Machine Translation with Source Syntax , 2017, IJCAI.

[6]  Xing Wang,et al.  Modeling Recurrence for Transformer , 2019, NAACL.

[7]  Nick Campbell,et al.  Doubly-Attentive Decoder for Multi-modal Neural Machine Translation , 2017, ACL.

[8]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[9]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[10]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[11]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[12]  Anoop Sarkar,et al.  Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing , 2018, EMNLP.

[13]  Lemao Liu,et al.  A Neural Approach to Source Dependence Based Context Model for Statistical Machine Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[15]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[16]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[17]  Tiejun Zhao,et al.  Syntax-Directed Attention for Neural Machine Translation , 2017, AAAI.

[18]  Guodong Zhou,et al.  Modeling Source Syntax for Neural Machine Translation , 2017, ACL.

[19]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[22]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[23]  Omer Levy,et al.  Are Sixteen Heads Really Better than One? , 2019, NeurIPS.

[24]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Yoshua Bengio,et al.  Straight to the Tree: Constituency Parsing with Neural Syntactic Distance , 2018, ACL.

[27]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[28]  Tong Zhang,et al.  Modeling Localness for Self-Attention Networks , 2018, EMNLP.

[29]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[30]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[31]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[32]  Rico Sennrich,et al.  Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation , 2015, TACL.

[33]  Masao Utiyama,et al.  Recurrent Positional Embedding for Neural Machine Translation , 2019, EMNLP/IJCNLP.

[34]  Wilker Aziz,et al.  Latent Variable Model for Multi-modal Translation , 2018, ACL.

[35]  Kenneth Heafield,et al.  Incorporating Source Syntax into Transformer-Based Neural Machine Translation , 2019, WMT.

[36]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[37]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[38]  Andy Way,et al.  Dependency Graph-to-String Translation , 2015, EMNLP.

[39]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[40]  Jian Li,et al.  Multi-Head Attention with Disagreement Regularization , 2018, EMNLP.

[41]  Yann Dauphin,et al.  A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.

[42]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[43]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[44]  Haizhou Li,et al.  A Tree Sequence Alignment-based Tree-to-Tree Translation Model , 2008, ACL.

[45]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[46]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[47]  Shuming Shi,et al.  Exploiting Deep Representations for Neural Machine Translation , 2018, EMNLP.

[48]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[49]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[50]  Yang Liu,et al.  Visualizing and Understanding Neural Machine Translation , 2017, ACL.

[51]  Richard Socher,et al.  Weighted Transformer Network for Machine Translation , 2017, ArXiv.

[52]  Qun Liu,et al.  Incorporating Global Visual Features into Attention-based Neural Machine Translation. , 2017, EMNLP.

[53]  Lemao Liu,et al.  Neural Machine Translation with Source Dependency Representation , 2017, EMNLP.

[54]  Tiejun Zhao,et al.  Improving Neural Machine Translation with Neural Syntactic Distance , 2019, NAACL.

[55]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[56]  Shujian Huang,et al.  Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder , 2017, ACL.

[57]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.