Predicting Target Language CCG Supertags Improves Neural Machine Translation

Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask training? We introduce syntactic information in the form of CCG supertags in the decoder, by interleaving the target supertags with the word sequence. Our results on WMT data show that explicitly modeling target-syntax improves machine translation quality for German->English, a high-resource pair, and for Romanian->English, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment. Furthermore, a tight coupling of words and syntax improves translation quality more than multitask training. By combining target-syntax with adding source-side dependency labels in the embedding layer, we obtain a total improvement of 0.9 BLEU for German->English and 1.2 BLEU for Romanian->English.

[1]  Philipp Koehn,et al.  GHKM Rule Extraction and Scope-3 Parsing in Moses , 2012, WMT@NAACL-HLT.

[2]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[3]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[4]  Luke S. Zettlemoyer,et al.  Joint A* CCG Parsing and Semantic Role Labelling , 2015, EMNLP.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Fethi Bougares,et al.  Factored Neural Machine Translation Architectures , 2016, IWSLT.

[7]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[8]  Gerold Schneider,et al.  Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis , 2013, RANLP.

[9]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[10]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[11]  Philipp Koehn,et al.  Edinburgh’s Syntax-Based Machine Translation Systems , 2013, WMT@ACL.

[12]  Mark Steedman,et al.  Hindi CCGbank: A CCG treebank from the Hindi dependency treebank , 2017, Language Resources and Evaluation.

[13]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[14]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[15]  Philipp Koehn,et al.  A Neural Verb Lexicon Model with Source-side Syntactic Context for String-to-Tree Machine Translation , 2016 .

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Philipp Koehn,et al.  CCG Supertags in Factored Statistical Machine Translation , 2007, WMT@ACL.

[18]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Philipp Koehn,et al.  Modeling Selectional Preferences of Verbs and Nouns in String-to-Tree Machine Translation , 2016, WMT.

[21]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[22]  Chris Quirk,et al.  Using Dependency Order Templates to Improve Generality in Translation , 2007, WMT@ACL.

[23]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[24]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[25]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[26]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[27]  Rico Sennrich,et al.  Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation , 2015, TACL.

[28]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[29]  Jan Niehues,et al.  Using Factored Word Representation in Neural Network Language Models , 2016, WMT.

[30]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[32]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[33]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[34]  Marcin Junczys-Dowmunt,et al.  Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.

[35]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[36]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.