Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed to large amounts of parallel corpora, our learners (implemented as encoder-decoder architecture) engage in cooperative image description games, and thus develop their own image captioning or neural machine translation model from the need to communicate in order to succeed at the game. Experimental results on the IAPR-TC12 and Multi30K datasets show that the proposed learning mechanism significantly improves over the state-of-the-art methods.

[1]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[2]  Stefan Riezler,et al.  Multimodal Pivots for Image Caption Translation , 2016, ACL.

[3]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[4]  Yang Liu,et al.  A Teacher-Student Framework for Zero-Resource Neural Machine Translation , 2017, ACL.

[5]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[6]  Yang Liu,et al.  Maximum Expected Likelihood Estimation for Zero-resource Neural Machine Translation , 2017, IJCAI.

[7]  Khalil Sima'an,et al.  Multi30K: Multilingual English-German Image Descriptions , 2016, VL@ACL.

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[14]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15]  Frank Keller,et al.  Image Pivoting for Learning Multilingual Multimodal Representations , 2017, EMNLP.

[16]  Angeliki Lazaridou,et al.  Towards Multi-Agent Communication-Based Language Learning , 2016, ArXiv.

[17]  Hideki Nakayama,et al.  Zero-resource machine translation by multimodal encoder–decoder network with multimedia pivot , 2016, Machine Translation.

[18]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[19]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[20]  Piek T. J. M. Vossen,et al.  Cross-linguistic differences and similarities in image descriptions , 2017, INLG.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[23]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[24]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[25]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Yang Liu,et al.  Joint training for pivot-based neural machine translation , 2017, IJCAI 2017.

[28]  Paul Clough,et al.  The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .

[29]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[30]  Nick Campbell,et al.  Doubly-Attentive Decoder for Multi-modal Neural Machine Translation , 2017, ACL.

[31]  Joost van de Weijer,et al.  Does Multimodality Help Human and Machine for Translation and Image Captioning? , 2016, WMT.