Supervised Seeded Iterated Learning for Interactive Language Learning

Language drift has been one of the major obstacles to train language models through interaction. When word-based conversational agents are trained towards completing a task, they tend to invent their language rather than leveraging natural language. In recent literature, two general methods partially counter this phenomenon: Supervised Selfplay (S2P) and Seeded Iterated Learning (SIL). While S2P jointly trains interactive and supervised losses to counter the drift, SIL changes the training dynamics to prevent language drift from occurring. In this paper, we first highlight their respective weaknesses, i.e., late-stage training collapses and higher negative likelihood when evaluated on human corpus. Given these observations, we introduce Supervised Seeded Iterated Learning to combine both methods to minimize their respective weaknesses. We then show the effectiveness of \algo in the language-drift translation game.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[3]  Bing Liu,et al.  Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning , 2018, NAACL.

[4]  Hugo Larochelle,et al.  GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  S. Kirby,et al.  Iterated learning and the evolution of language , 2014, Current Opinion in Neurobiology.

[6]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[7]  Simon Kirby,et al.  Spontaneous evolution of linguistic structure-an iterated learning model of the emergence of regularity and irregularity , 2001, IEEE Trans. Evol. Comput..

[8]  Angeliki Lazaridou,et al.  Multi-agent Communication meets Natural Language: Synergies between Functional and Structural Language Learning , 2020, ACL.

[9]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[10]  Khalil Sima'an,et al.  Multi30K: Multilingual English-German Image Descriptions , 2016, VL@ACL.

[11]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[12]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[13]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Quoc V. Le,et al.  AirDialogue: An Environment for Goal-Oriented Dialogue Research , 2018, EMNLP.

[15]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[16]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[17]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[18]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[19]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[20]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[21]  Joelle Pineau,et al.  On the interaction between supervision and self-play in emergent communication , 2020, ICLR.

[22]  Dimitris N. Metaxas,et al.  Interactive Reinforcement Learning for Object Grounding via Self-Talking , 2017, ArXiv.

[23]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[24]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[26]  Karol Hausman,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[27]  Ben Poole,et al.  Categorical Reparametrization with Gumble-Softmax , 2017, ICLR 2017.

[28]  Kyunghyun Cho,et al.  Countering Language Drift via Visual Grounding , 2019, EMNLP.

[29]  Hannes Schulz,et al.  Frames: a corpus for adding memory to goal-oriented dialogue systems , 2017, SIGDIAL Conference.

[30]  Stefan Lee,et al.  Evaluating Visual Conversational Agents via Cooperative Human-AI Games , 2017, HCOMP.

[31]  Antoine Raux,et al.  The Dialog State Tracking Challenge Series , 2014, AI Mag..

[32]  Oliver Lemon,et al.  Author manuscript, published in "European Conference on Speech Communication and Technologies (Interspeech'07), Anvers: Belgium (2007)" Machine Learning for Spoken Dialogue Systems , 2022 .

[33]  Yuchen Lu,et al.  Countering Language Drift with Seeded Iterated Learning , 2020, ICML.