A Comparable Study on Model Averaging, Ensembling and Reranking in NMT

Neural machine translation has become a benchmark method in machine translation. Many novel structures and methods have been proposed to improve the translation quality. However, it is difficult to train and turn parameters. In this paper, we focus on decoding techniques that boost translation performance by utilizing existing models. We address the problem from three aspects—parameter, word and sentence level, corresponding to checkpoint averaging, model ensembling and candidates reranking which all do not need to retrain the model. Experimental results have shown that the proposed decoding approaches can significantly improve the performance over baseline model.

[1]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[2]  Min Zhang,et al.  Neural Machine Translation Advised by Statistical Machine Translation , 2016, AAAI.

[3]  Yu Zhou,et al.  Tree-based Translation without using Parse Trees , 2012, COLING.

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[6]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[7]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[8]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[9]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[10]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[11]  Jiajun Zhang,et al.  Neural System Combination for Machine Translation , 2017, ACL.

[12]  Maosong Sun,et al.  Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.

[13]  Zhiguo Wang,et al.  A Coverage Embedding Model for Neural Machine Translation , 2016, ArXiv.

[14]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[15]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[18]  Rico Sennrich,et al.  The University of Edinburgh’s Neural MT Systems for WMT17 , 2017, WMT.

[19]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[20]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[21]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[22]  Hideki Nakayama,et al.  Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation , 2017, ArXiv.

[23]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Hua Wu,et al.  Improved Neural Machine Translation with SMT Features , 2016, AAAI.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.