论文信息 - Exploration strategies in n-Person general-sum multiagent reinforcement learning with sequential action selection

Exploration strategies in n-Person general-sum multiagent reinforcement learning with sequential action selection

In this paper, two novel exploration strategies are proposed for n-person general-sum multiagent reinforcement learning with sequential action selection. The existing learning process, called extensive Markov game, is considered as a set of successive extensive form games with perfect information. We introduce an estimated value for taking actions in games with respect to other agents' preferences which is called associative Q-value. They can be used to select actions probabilistically according to Boltzmann distribution. Simulation results present the effectiveness of the proposed exploration strategies that are used in our previously introduced extensive-Q learning methods. Regarding the complexity of existing methods of computing Nash equilibrium points, if it is possible to assume sequential action selection among agents, extensive-Q will be more convenient for dynamic task multiagent systems with more than two agents.

[1] Daniel Kudenko,et al. Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[2] A. Talman,et al. Simplicial variable dimension algorithms for solving the nonlinear complementarity problem on a product of unit simplices using a general labelling , 1987 .

[3] Ahmad Afshar,et al. Multiagent Reniforcement Learning in Extensive Form Games with Perfect Information , 2009 .

[4] Aranyak Mehta,et al. Playing large games using simple strategies , 2003, EC '03.

[5] Martin J. Osborne,et al. An Introduction to Game Theory , 2003 .

[6] Craig Boutilier,et al. Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[7] Robert Wilson,et al. A global Newton method to compute Nash equilibria , 2003, J. Econ. Theory.

[8] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[9] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[12] Ville Könönen,et al. Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[13] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[14] Tucker R. Balch,et al. Behavior-based formation control for multirobot teams , 1998, IEEE Trans. Robotics Autom..

[15] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[16] Jordan B. Pollack,et al. A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[17] Mohammad Bagher Menhaj,et al. Multiagent reinforcement learning in extensive form games with complete information , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[18] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[19] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .

[20] Herbert E. Scarf,et al. The Approximation of Fixed Points of a Continuous Mapping , 1967 .

[21] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[23] Xiaofeng Wang,et al. Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[24] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[25] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[26] Kenneth A. De Jong,et al. A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[27] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[28] C. E. Lemke,et al. Equilibrium Points of Bimatrix Games , 1964 .

[29] Ahmad Afshar,et al. Different forms of the games in multiagent reinforcement learning: alternating vs. simultanous movements , 2009, 2009 17th Mediterranean Conference on Control and Automation.