Learning When to Transfer among Agents: An Efficient Multiagent Transfer Learning Framework

Transfer Learning has shown great potential to enhance the single-agent Reinforcement Learning (RL) efficiency, by sharing previously learned policies. Inspired by this, the team learning performance in multiagent settings can be potentially promoted with agents reusing knowledge between each other when all agents interact with the environment and learn simultaneously. However, how each independent agent selectively learns from other agents' knowledge is still a problem. In this paper, we propose a novel multi-agent transfer learning framework to improve the learning efficiency of multiagent systems. Our framework learns when and what advice to give to each agent and when to terminate it by modeling multi-agent transfer as the option learning problem. We also propose a novel option learning algorithm, named as the Successor Representation Option (SRO) learning that decouples the dynamics of the environment from the rewards to learn the option-value function under each agent's preference. The proposed framework can be easily combined with existing deep RL approaches. Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art deep RL methods in terms of learning efficiency and final performance in both discrete and continuous action spaces.

[1]  Jonathan P. How,et al.  Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.

[2]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[3]  Yujing Hu,et al.  Efficient Deep Reinforcement Learning via Adaptive Policy Transfer , 2020, IJCAI.

[4]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[5]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[6]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[7]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[8]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[9]  Jonathan P. How,et al.  Policy Distillation and Value Matching in Multiagent Reinforcement Learning , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  M. Mead,et al.  Cybernetics , 1953, The Yale Journal of Biology and Medicine.

[13]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[14]  Servicio Geológico Colombiano Sgc Volume 4 , 2013, Journal of Diabetes Investigation.

[15]  Yujing Hu,et al.  From Few to More: Large-scale Dynamic Multiagent Curriculum Learning , 2020, AAAI.

[16]  Sinno Jialin Pan,et al.  Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay , 2017, AAAI.

[17]  Yongyuan Liang,et al.  Parallel Knowledge Transfer in Multi-Agent Reinforcement Learning , 2020, ArXiv.

[18]  Felipe Leno da Silva,et al.  A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[19]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Ioannis P. Vlahavas,et al.  Transfer Learning in Multi-Agent Reinforcement Learning Domains , 2011, EWRL.

[22]  Kaigui Bian,et al.  Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach , 2020, ArXiv.

[23]  Jonathan P. How,et al.  Learning Hierarchical Teaching Policies for Cooperative Agents , 2020, AAMAS.

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[25]  Sumit Kumar,et al.  Learning Transferable Cooperative Behavior in Multi-Agent Teams , 2019, AAMAS.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  Yujing Hu,et al.  Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer , 2015, IEEE Transactions on Cybernetics.

[28]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[29]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[30]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[31]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[32]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[33]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[34]  Michael Wooldridge,et al.  Autonomous agents and multi-agent systems , 2014 .

[35]  Peter Stone,et al.  Agents teaching agents: a survey on inter-agent transfer learning , 2020 .

[36]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[37]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[38]  Geoff S. Nitschke,et al.  Multi-agent Behavior-Based Policy Transfer , 2016, EvoApplications.