Countering Language Drift via Visual Grounding

Emergent multi-agent communication protocols are very different from natural language and not easily interpretable by humans. We find that agents that were initially pretrained to produce natural language can also experience detrimental language drift: when a non-linguistic reward is used in a goal-based task, e.g. some scalar success metric, the communication protocol may easily and radically diverge from natural language. We recast translation as a multi-agent communication game and examine auxiliary training constraints for their effectiveness in mitigating language drift. We show that a combination of syntactic (language model likelihood) and semantic (visual grounding) constraints gives the best communication performance, allowing pre-trained agents to retain English syntax while learning to accurately convey the intended meaning.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[6]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[7]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[8]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[9]  Yang Liu,et al.  Joint Training for Pivot-based Neural Machine Translation , 2016, IJCAI.

[10]  W. Strange Evolution of language. , 1984, JAMA.

[11]  Frank Keller,et al.  Image Pivoting for Learning Multilingual Multimodal Representations , 2017, EMNLP.

[12]  Douwe Kiela,et al.  Deep embodiment: grounding semantics in perceptual modalities , 2017 .

[13]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[14]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[15]  Kyunghyun Cho,et al.  Task-Oriented Query Reformulation with Reinforcement Learning , 2017, EMNLP.

[16]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[17]  Armin W. Schulz Signals: evolution, learning, and information , 2012 .

[18]  Jiajun Zhang,et al.  Towards Neural Machine Translation with Partially Aligned Corpora , 2017, IJCNLP.

[19]  Kazunori Muraki VENUS: Two-phase machine translation system , 1986, Future Gener. Comput. Syst..

[20]  Marco Baroni,et al.  Grounding Distributional Semantics in the Visual World , 2016, Lang. Linguistics Compass.

[21]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[22]  Kyunghyun Cho,et al.  Emergent Language in a Multi-Modal, Multi-Step Referential Game , 2017, ArXiv.

[23]  Stefan Riezler,et al.  Multimodal Pivots for Image Caption Translation , 2016, ACL.

[24]  Kyunghyun Cho,et al.  Emergent Communication in a Multi-Modal, Multi-Step Referential Game , 2017, ICLR.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Allan Jabri,et al.  Learning Visually Grounded Sentence Representations , 2018, NAACL.

[27]  Igor Mordatch,et al.  A Paradigm for Situated and Goal-Driven Language Learning , 2016, ArXiv.

[28]  Yang Liu,et al.  Zero-Resource Neural Machine Translation with Multi-Agent Communication Game , 2018, AAAI.

[29]  Hideki Nakayama,et al.  Zero-resource machine translation by multimodal encoder–decoder network with multimedia pivot , 2016, Machine Translation.

[30]  Grzegorz Chrupala,et al.  Learning language through pictures , 2015, ACL.

[31]  Anca D. Dragan,et al.  Translating Neuralese , 2017, ACL.

[32]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[33]  Simon Kirby,et al.  Spontaneous evolution of linguistic structure-an iterated learning model of the emergence of regularity and irregularity , 2001, IEEE Trans. Evol. Comput..

[34]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[35]  Luc Steels,et al.  The synthetic modeling of language origins , 1997 .

[36]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[37]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[38]  Victor O. K. Li,et al.  Trainable Greedy Decoding for Neural Machine Translation , 2017, EMNLP.

[39]  Desmond Elliott,et al.  Imagination Improves Multimodal Translation , 2017, IJCNLP.

[40]  David J. Fleet,et al.  VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.

[41]  Jason Lee,et al.  Emergent Translation in Multi-Agent Communication , 2017, ICLR.

[42]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[43]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[44]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[45]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[46]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[47]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[48]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[49]  Desmond Elliott,et al.  Lessons Learned in Multilingual Grounded Language Learning , 2018, CoNLL.