Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer

An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

[1]  Anna Helena Reali Costa,et al.  Experience generalization for concurrent reinforcement learners: the minimax-QS algorithm , 2002, AAMAS '02.

[2]  Tuomas Sandholm,et al.  Lossy stochastic game abstraction with bounds , 2012, EC '12.

[3]  Athanasios V. Vasilakos,et al.  The use of learning algorithms in ATM networks call admission control problem: a methodology , 1995, Proceedings of 20th Conference on Local Computer Networks.

[4]  Charles Lee Isbell,et al.  Quick Polytope Approximation of All Correlated Equilibria in Stochastic Games , 2011, AAAI.

[5]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[6]  Yoav Shoham,et al.  Simple search methods for finding a Nash equilibrium , 2004, Games Econ. Behav..

[7]  Oriol Sallent,et al.  Intercell Interference Management in OFDMA Networks: A Decentralized Approach Based onReinforcement Learning , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Wei Zhang,et al.  Multiagent-Based Reinforcement Learning for Optimal Reactive Power Dispatch , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Peter Vrancx,et al.  Learning multi-agent state space representations , 2010, AAMAS.

[10]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[11]  Victor R. Lesser,et al.  Integrating organizational control into multi-agent learning , 2009, AAMAS.

[12]  Maria L. Gini,et al.  Fast adaptive learning in repeated stochastic games by game abstraction , 2014, AAMAS.

[13]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[14]  Ronen I. Brafman,et al.  Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study , 2008 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Peter Vrancx,et al.  Decentralized Learning in Markov Games , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Nikos A. Vlassis,et al.  Sparse cooperative Q-learning , 2004, ICML.

[18]  Pengcheng Zhang,et al.  A novel multi-agent reinforcement learning approach for job scheduling in Grid computing , 2011, Future Gener. Comput. Syst..

[19]  Naixue Xiong,et al.  A game-theoretic method of fair resource allocation for cloud computing services , 2010, The Journal of Supercomputing.

[20]  Charles L. Isbell,et al.  Solving Stochastic Games , 2009, NIPS.

[21]  Athanasios V. Vasilakos,et al.  A new approach to the design of reinforcement schemes for learning automata: Stochastic estimator learning algorithm , 1995, Neurocomputing.

[22]  Reinaldo A. C. Bianchi,et al.  Heuristic Selection of Actions in Multiagent Reinforcement Learning , 2007, IJCAI.

[23]  Ville Könönen,et al.  Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[24]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[26]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[27]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[28]  Athanasios V. Vasilakos,et al.  Game Dynamics and Cost of Learning in Heterogeneous 4G Networks , 2012, IEEE Journal on Selected Areas in Communications.

[29]  Victor R. Lesser,et al.  Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[30]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[31]  Yong Duan,et al.  A multi-agent reinforcement learning approach to robot soccer , 2012, Artificial Intelligence Review.

[32]  Manuela M. Veloso,et al.  Learning of coordination: exploiting sparse interactions in multiagent systems , 2009, AAMAS.

[33]  Victor R. Lesser,et al.  Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs , 2011, AAAI.

[34]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[35]  Peter Vrancx,et al.  Generalized learning automata for multi-agent reinforcement learning , 2010, AI Commun..

[36]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[37]  Witold Pedrycz,et al.  Optimizing QoS routing in hierarchical ATM networks using computational intelligence techniques , 2003, IEEE Trans. Syst. Man Cybern. Part C.