Exploration strategies in n-Person general-sum multiagent reinforcement learning with sequential action selection

In this paper, two novel exploration strategies are proposed for n-person general-sum multiagent reinforcement learning with sequential action selection. The existing learning process, called extensive Markov game, is considered as a set of successive extensive form games with perfect information. We introduce an estimated value for taking actions in games with respect to other agents' preferences which is called associative Q-value. They can be used to select actions probabilistically according to Boltzmann distribution. Simulation results present the effectiveness of the proposed exploration strategies that are used in our previously introduced extensive-Q learning methods. Regarding the complexity of existing methods of computing Nash equilibrium points, if it is possible to assume sequential action selection among agents, extensive-Q will be more convenient for dynamic task multiagent systems with more than two agents.

[1]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[2]  A. Talman,et al.  Simplicial variable dimension algorithms for solving the nonlinear complementarity problem on a product of unit simplices using a general labelling , 1987 .

[3]  Ahmad Afshar,et al.  Multiagent Reniforcement Learning in Extensive Form Games with Perfect Information , 2009 .

[4]  Aranyak Mehta,et al.  Playing large games using simple strategies , 2003, EC '03.

[5]  Martin J. Osborne,et al.  An Introduction to Game Theory , 2003 .

[6]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[7]  Robert Wilson,et al.  A global Newton method to compute Nash equilibria , 2003, J. Econ. Theory.

[8]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[9]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[12]  Ville Könönen,et al.  Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[13]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[14]  Tucker R. Balch,et al.  Behavior-based formation control for multirobot teams , 1998, IEEE Trans. Robotics Autom..

[15]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[16]  Jordan B. Pollack,et al.  A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[17]  Mohammad Bagher Menhaj,et al.  Multiagent reinforcement learning in extensive form games with complete information , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[18]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[19]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[20]  Herbert E. Scarf,et al.  The Approximation of Fixed Points of a Continuous Mapping , 1967 .

[21]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[23]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[24]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[25]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[26]  Kenneth A. De Jong,et al.  A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[27]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[28]  C. E. Lemke,et al.  Equilibrium Points of Bimatrix Games , 1964 .

[29]  Ahmad Afshar,et al.  Different forms of the games in multiagent reinforcement learning: alternating vs. simultanous movements , 2009, 2009 17th Mediterranean Conference on Control and Automation.