A multi-agent reinforcement learning algorithm based on Stackelberg game

Multi-agent reinforcement learning has been paid much attention due to its wide applications in various engineering systems. In this paper, the control problems of large-scale multi-agent systems with multiple roles are formulated into a multiplayer Stackelberg game, which provides a new perspective on cooperative issues. Then a Stackelberg Q-learning algorithm is proposed and knowledge transfer is applied to improve the efficiency of the learning process. Finally the proposed algorithm is applied to cognitive radio networks. Simulation results indicate that the Stackelberg Q-learning algorithm can efficiently promote the utility of the agents in the system, and significantly reduce the interference caused by the jammer.

[1]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[2]  Sunita S. Barve,et al.  Dynamic channel selection and routing through reinforcement learning in Cognitive Radio Networks , 2012, 2012 IEEE International Conference on Computational Intelligence and Computing Research.

[3]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[4]  Amit Konar,et al.  Multi-agent reinfocement learning for stochastic power management in cognitive radio network , 2016, 2016 International Conference on Microelectronics, Computing and Communications (MicroCom).

[5]  Peter Vrancx,et al.  Game Theory and Multi-agent Reinforcement Learning , 2012, Reinforcement Learning.

[6]  Michael P. Wellman,et al.  Experimental Results on Q-Learning for General-Sum Stochastic Games , 2000, ICML.

[7]  Yujing Hu,et al.  Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer , 2015, IEEE Transactions on Cybernetics.

[8]  Ian F. Akyildiz,et al.  CRAHNs: Cognitive radio ad hoc networks , 2009, Ad Hoc Networks.

[9]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[11]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[12]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[13]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[14]  Joseph Mitola Policy languages for cognitive radio , 2009, 2009 IEEE Radio and Wireless Symposium.

[15]  Kazushi Nakano,et al.  Multi-robot coordination using switching of methods for deriving equilibrium in game theory , 2013, 2013 10th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.

[16]  Honggang Zhang,et al.  Achievements and the Road Ahead: The First Decade of Cognitive Radio , 2010, IEEE Trans. Veh. Technol..

[17]  Andrew W. Moore,et al.  An Introduction to Reinforcement Learning , 1995 .

[18]  Derek Fagan,et al.  Dynamic Multi-agent Reinforcement Learning for Control Optimization , 2014, 2014 5th International Conference on Intelligent Systems, Modelling and Simulation.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[21]  Krzysztof Skrzypczyk Game theory based target following by a team of robots , 2004, Proceedings of the Fourth International Workshop on Robot Motion and Control (IEEE Cat. No.04EX891).

[22]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[23]  Kok-Lim Alvin Yau,et al.  Reinforcement learning approach for centralized Cognitive Radio systems , 2012, ICWCA.

[24]  A. M. Fink,et al.  Equilibrium in a stochastic $n$-person game , 1964 .

[25]  Yujing Hu,et al.  Learning in Multi-agent Systems with Sparse Interactions by Knowledge Transfer and Game Abstraction , 2015, AAMAS.