Incorporating Source-Side Phrase Structures into Neural Machine Translation

Neural machine translation (NMT) has shown great success as a new alternative to the traditional Statistical Machine Translation model in multiple languages. Early NMT models are based on sequence-to-sequence learning that encodes a sequence of source words into a vector space and generates another sequence of target words from the vector. In those NMT models, sentences are simply treated as sequences of words without any internal structure. In this article, we focus on the role of the syntactic structure of source sentences and propose a novel end-to-end syntactic NMT model, which we call a tree-to-sequence NMT model, extending a sequence-to-sequence model with the source-side phrase structure. Our proposed model has an attention mechanism that enables the decoder to generate a translated word while softly aligning it with phrases as well as words of the source sentence. We have empirically compared the proposed model with sequence-to-sequence models in various settings on Chinese-to-Japanese and English-to-Japanese translation tasks. Our experimental results suggest that the use of syntactic structure can be beneficial when the training data set is small, but is not as effective as using a bi-directional encoder. As the size of training data set increases, the benefits of using a syntactic tree tends to diminish.

[1]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[2]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[3]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Graham Neubig,et al.  Overview of the 3rd Workshop on Asian Translation , 2015, WAT@COLING.

[6]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[7]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[8]  Richard Socher,et al.  Towards Neural Machine Translation with Latent Tree Attention , 2017, SPNLP@EMNLP.

[9]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[10]  Satoshi Nakamura,et al.  Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT2015 , 2015, WAT.

[11]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[12]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13]  Yoshua Bengio,et al.  Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation , 2014, SSST@EMNLP.

[14]  Pradeep Dubey,et al.  BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies , 2015, ICLR.

[15]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[16]  Toshiaki Nakazawa,et al.  ASPEC: Asian Scientific Paper Excerpt Corpus , 2016, LREC.

[17]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[18]  Yonatan Bisk,et al.  Inducing Grammars with and for Neural Machine Translation , 2018, NMT@ACL.

[19]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[20]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[21]  Eiichiro Sumita,et al.  Overview of the 1st Workshop on Asian Translation , 2014, WAT.

[22]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[25]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[26]  Zhongyuan Zhu,et al.  Evaluating Neural Machine Translation in English-Japanese Task , 2015, WAT.

[27]  Wei Xu,et al.  Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.

[28]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[29]  Ming Zhou,et al.  Sequence-to-Dependency Neural Machine Translation , 2017, ACL.

[30]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[31]  Shujian Huang,et al.  Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder , 2017, ACL.

[32]  Razvan Pascanu,et al.  Understanding the exploding gradient problem , 2012, ArXiv.

[33]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[34]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Satoshi Nakamura,et al.  Neural Machine Translation via Binary Code Prediction , 2017, ACL.

[37]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[38]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[39]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[40]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[41]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[42]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[43]  Haixia Wang,et al.  Cache Friendly Parallelization of Neural Encoder-Decoder Models Without Padding on Multi-core Architecture , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[44]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[45]  Jun-Seok Kim,et al.  NAVER Machine Translation System for WAT 2015 , 2015, WAT.

[46]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[47]  Ivan A. Sag,et al.  Syntactic Theory: A Formal Introduction , 1999, Computational Linguistics.

[48]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[49]  Kevin Duh,et al.  On the Elements of an Accurate Tree-to-String Machine Translation System , 2014, ACL.

[50]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[51]  Yoshimasa Tsuruoka,et al.  Neural Machine Translation with Source-Side Latent Graph Parsing , 2017, EMNLP.

[52]  Graham Neubig,et al.  Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis , 2011, ACL.