Explorer Predicting Target Language CCG Supertags Improves Neural Machine Translation

Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask training? We introduce syntactic information in the form of CCG supertags in the decoder, by interleaving the target supertags with the word sequence. Our results on WMT data show that explicitly modeling targetsyntax improves machine translation quality for German→English, a high-resource pair, and for Romanian→English, a lowresource pair and also several syntactic phenomena including prepositional phrase attachment. Furthermore, a tight coupling of words and syntax improves translation quality more than multitask training. By combining target-syntax with adding source-side dependency labels in the embedding layer, we obtain a total improvement of 0.9 BLEU for German→English and 1.2 BLEU for Romanian→English.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[3]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[4]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[5]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[6]  Chris Quirk,et al.  Using Dependency Order Templates to Improve Generality in Translation , 2007, WMT@ACL.

[7]  Philipp Koehn,et al.  CCG Supertags in Factored Statistical Machine Translation , 2007, WMT@ACL.

[8]  Philipp Koehn,et al.  GHKM Rule Extraction and Scope-3 Parsing in Moses , 2012, WMT@NAACL-HLT.

[9]  Philipp Koehn,et al.  Edinburgh’s Syntax-Based Machine Translation Systems , 2013, WMT@ACL.

[10]  Gerold Schneider,et al.  Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis , 2013, RANLP.

[11]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[12]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[13]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Rico Sennrich,et al.  Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation , 2015, TACL.

[16]  Luke S. Zettlemoyer,et al.  Joint A* CCG Parsing and Semantic Role Labelling , 2015, EMNLP.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[20]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[21]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[22]  Marcin Junczys-Dowmunt,et al.  Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.

[23]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[24]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[25]  Fethi Bougares,et al.  Factored Neural Machine Translation Architectures , 2016, IWSLT.

[26]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[27]  Philipp Koehn,et al.  A Neural Verb Lexicon Model with Source-side Syntactic Context for String-to-Tree Machine Translation , 2016 .

[28]  Philipp Koehn,et al.  Modeling Selectional Preferences of Verbs and Nouns in String-to-Tree Machine Translation , 2016, WMT.

[29]  Arianna Bisazza,et al.  Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[30]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[31]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[32]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[33]  Jan Niehues,et al.  Using Factored Word Representation in Neural Network Language Models , 2016, WMT.

[34]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[35]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[36]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[37]  Mark Steedman,et al.  Hindi CCGbank: A CCG treebank from the Hindi dependency treebank , 2017, Language Resources and Evaluation.