How can we teach artificial agents to use human language flexibly to solve problems in real-world environments? We have an example of this in nature: human babies eventually learn to use human language to solve problems, and they are taught with an adult human-in-the-loop. Unfortunately, current machine learning methods (e.g. from deep reinforcement learning) are too data inefficient to learn language in this way (3). An outstanding goal is finding an algorithm with a suitable ‘language learning prior’ that allows it to learn human language, while minimizing the number of on-policy human interactions. In this paper, we propose to learn such a prior in simulation, leveraging the increasing amount of available compute for machine learning experiments (1). We call our approach Learning to Learn to Communicate (L2C). Specifically, in L2C we train a meta-learning agent in simulation to interact with populations of pre-trained agents, each with their own distinct communication protocol. Once the meta-learning agent is able to quickly adapt to each population of agents, it can be deployed in new populations, including populations speaking human language. Our key insight is that such populations can be obtained via self-play, after pre-training agents with imitation learning on a small amount of off-policy human language data. We call this latter technique Seeded Self-Play (S2P). Our preliminary experiments show that agents trained with L2C and S2P need fewer on-policy samples to learn a compositional language in a Lewis signaling game (4). Equal contribution Facebook AI Research MILA. Correspondence to: Ryan Lowe , Abhinav Gupta . Proceedings of the 1 st Adaptive & Multitask Learning Workshop, Long Beach, California, 2019. Copyright 2019 by the author(s).
[1]
Ben Poole,et al.
Categorical Reparameterization with Gumbel-Softmax
,
2016,
ICLR.
[2]
José M. F. Moura,et al.
Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog
,
2017,
EMNLP.
[3]
Jason Weston,et al.
Talk the Walk: Navigating New York City through Grounded Dialogue
,
2018,
ArXiv.
[4]
Pieter Abbeel,et al.
Emergence of Grounded Compositional Language in Multi-Agent Populations
,
2017,
AAAI.
[5]
Igor Mordatch,et al.
A Paradigm for Situated and Goal-Driven Language Learning
,
2016,
ArXiv.
[6]
Pushmeet Kohli,et al.
Learning to Follow Language Instructions with Adversarial Reward Induction
,
2018,
ArXiv.
[7]
David Lewis.
Convention: A Philosophical Study
,
1986
.
[8]
Yi Wu,et al.
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
,
2017,
NIPS.
[9]
Joshua Achiam,et al.
On First-Order Meta-Learning Algorithms
,
2018,
ArXiv.
[10]
Jimmy Ba,et al.
Adam: A Method for Stochastic Optimization
,
2014,
ICLR.
[11]
Sergey Levine,et al.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
,
2017,
ICML.
[12]
Doina Precup,et al.
Shaping representations through communication
,
2018
.
[13]
P. Alam.
‘N’
,
2021,
Composites Engineering: An A–Z Guide.
[14]
Alexander Peysakhovich,et al.
Multi-Agent Cooperation and the Emergence of (Natural) Language
,
2016,
ICLR.