Policy Distillation and Value Matching in Multiagent Reinforcement Learning

Multiagent reinforcement learning (MARL) algorithms have been demonstrated on complex tasks that require the coordination of a team of multiple agents to complete. Existing works have focused on sharing information between agents via centralized critics to stabilize learning or through communication to improve performance, but do not generally consider how information can be shared between agents to address the curse of dimensionality in MARL. We posit that a multiagent problem can be decomposed into a multi-task problem where each agent explores a subset of the state space instead of exploring the entire state space. This paper introduces a multiagent actor-critic algorithm for combining knowledge from homogeneous agents through distillation and value-matching that outperforms policy distillation alone and allows further learning in discrete and continuous action spaces.

[1]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[2]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[3]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[4]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[5]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[6]  Siobhán Clarke,et al.  Transfer learning in multi-agent systems through parallel transfer , 2013 .

[7]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[8]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[9]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[10]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[11]  Paul E. Utgoff,et al.  A Teaching Method for Reinforcement Learning , 1992, ML.

[12]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[13]  Jonathan P. How,et al.  Learning Hierarchical Teaching in Cooperative Multiagent Reinforcement Learning , 2019, ArXiv.

[14]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[15]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[16]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[17]  Yisong Yue,et al.  Coordinated Multi-Agent Imitation Learning , 2017, ICML.

[18]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[19]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[20]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[21]  Tucker R. Balch,et al.  Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning , 2001, ICML.

[22]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[23]  Ioannis P. Vlahavas,et al.  Transfer Learning in Multi-Agent Reinforcement Learning Domains , 2011, EWRL.

[24]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[25]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[26]  Felipe Leno da Silva,et al.  Simultaneously Learning and Advising in Multiagent Reinforcement Learning , 2017, AAMAS.

[27]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[28]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[29]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[30]  Garrison W. Cottrell,et al.  Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[31]  T. Urbanik,et al.  Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[32]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[33]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[34]  Jonathan P. How,et al.  Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.