Edinburgh Neural Machine Translation Systems for WMT 16

We participated in the WMT 2016 shared news translation task by building neural translation systems for four language pairs, each trained in both directions: English Czech, English German, English Romanian and English Russian. Our systems are based on an attentional encoder-decoder, using BPE subword segmentation for open-vocabulary translation with a fixed vocabulary. We experimented with using automatic back-translations of the monolingual News corpus as additional training data, pervasive dropout, and target-bidirectional models. All reported methods give substantial improvements, and we see improvements of 4.3--11.2 BLEU over our baseline systems. In the human evaluation, our systems were the (tied) best constrained system for 7 out of 8 translation directions in which we participated.

[1]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[4]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[5]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[6]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[7]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[8]  Ondrej Dusek,et al.  CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered , 2016, TSD.

[9]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[10]  Lemao Liu,et al.  Agreement on Target-bidirectional Neural Machine Translation , 2016, NAACL.

[11]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[12]  Rico Sennrich,et al.  The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT , 2016, WMT.

[13]  Fabienne Braune,et al.  The QT21/HimL Combined Machine Translation System , 2016, WMT.