Skill Discovery of Coordination in Multi-agent Reinforcement Learning

Unsupervised skill discovery drives intelligent agents to explore the unknown environment without task-specific reward signal, and the agents acquire various skills which may be useful when the agents adapt to new tasks. In this paper, we propose "Multi-agent Skill Discovery"(MASD), a method for discovering skills for coordination patterns of multiple agents. The proposed method aims to maximize the mutual information between a latent code Z representing skills and the combination of the states of all agents. Meanwhile it suppresses the empowerment of Z on the state of any single agent by adversarial training. In another word, it sets an information bottleneck to avoid empowerment degeneracy. First we show the emergence of various skills on the level of coordination in a general particle multi-agent environment. Second, we reveal that the "bottleneck" prevents skills from collapsing to a single agent and enhances the diversity of learned skills. Finally, we show the pretrained policies have better performance on supervised RL tasks.

[1]  Wenwu Yu,et al.  An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[2]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[3]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[4]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[5]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[6]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[7]  Yang Liu,et al.  Stein Variational Policy Gradient , 2017, UAI.

[8]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[9]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[10]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[11]  David Barber,et al.  Variational methods for Reinforcement Learning , 2010, AISTATS.

[12]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[13]  Sergey Levine,et al.  Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow , 2018, ICLR.

[14]  Yisong Yue,et al.  Coordinated Multi-Agent Imitation Learning , 2017, ICML.

[15]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[16]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[17]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[18]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[19]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[20]  Shimon Whiteson,et al.  MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.

[21]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[22]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[23]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[24]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[25]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[28]  Shimon Whiteson,et al.  Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[29]  Stefano Ermon,et al.  Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[30]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[31]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.