Learning in Multi-agent Systems with Sparse Interactions by Knowledge Transfer and Game Abstraction

In many multi-agent systems, the interactions between agents are sparse and exploiting interaction sparseness in multi-agent reinforcement learning (MARL) can improve the learning performance. Also, agents may have already learnt some single-agent knowledge (e.g., local value function) before the multi-agent learning process. In this work, we investigate how such knowledge can be utilized to learn better policies in multi-agent systems with sparse interactions. We adopt game theory-based MARL as the basic learning approach since it can coordinate agents better. We contribute three knowledge transfer mechanisms. The first one is value function transfer, which directly transfers agents' local value functions to the learning algorithm. The second one is selective value function transfer, which only transfers the value functions in states where the environmental dynamics change slightly. The last mechanism is model transfer-based game abstraction, which further improves the former two mechanisms by abstracting the one-shot game in each state and reducing equilibrium computation. Experimental results in benchmarks show that with the three knowledge transfer mechanisms, all of the tested game theory-based MARL algorithms are drastically improved and also achieve better asymptotic performance than the state-of-the-art algorithm CQ-learning.

[1]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[2]  Bikramjit Banerjee,et al.  Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs , 2012, AAAI.

[3]  Peter Vrancx,et al.  Transfer Learning for Multi-agent Coordination , 2011, ICAART.

[4]  Nikos A. Vlassis,et al.  Sparse cooperative Q-learning , 2004, ICML.

[5]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[7]  Yang Gao,et al.  Multiagent Reinforcement Learning With Unshared Value Functions , 2015, IEEE Transactions on Cybernetics.

[8]  Csaba Szepesvári,et al.  Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[9]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[10]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[11]  Siobhán Clarke,et al.  Transfer learning in multi-agent systems through parallel transfer , 2013 .

[12]  Peter Stone,et al.  Graph-Based Domain Mapping for Transfer Learning in General Games , 2007, ECML.

[13]  Peter Vrancx,et al.  Solving Sparse Delayed Coordination Problems in Multi-Agent Reinforcement Learning , 2011, ALA.

[14]  Nikos A. Vlassis,et al.  Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[15]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[16]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[17]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[18]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[19]  Yujing Hu,et al.  Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer , 2015, IEEE Transactions on Cybernetics.

[20]  Manuela M. Veloso,et al.  Learning of coordination: exploiting sparse interactions in multiagent systems , 2009, AAMAS.

[21]  Peter Vrancx,et al.  Learning multi-agent state space representations , 2010, AAMAS.

[22]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[23]  Manuela M. Veloso,et al.  Decentralized MDPs with sparse interactions , 2011, Artif. Intell..

[24]  Paul W. Goldberg,et al.  The Complexity of Computing a Nash Equilibrium , 2009, SIAM J. Comput..