Dynamic analysis of multiagent Q-learning with ε-greedy exploration

The development of mechanisms to understand and model the expected behaviour of multiagent learners is becoming increasingly important as the area rapidly find application in a variety of domains. In this paper we present a framework to model the behaviour of Q-learning agents using the ε-greedy exploration mechanism. For this, we analyse a continuous-time version of the Q-learning update rule and study how the presence of other agents and the ε-greedy mechanism affect it. We then model the problem as a system of difference equations which is used to theoretically analyse the expected behaviour of the agents. The applicability of the framework is tested through experiments in typical games selected from the literature.

[1]  David S. Leslie,et al.  Individual Q-Learning in Normal Form Games , 2005, SIAM J. Control. Optim..

[2]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[3]  Karl Tuyls,et al.  Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..

[4]  Victor R. Lesser,et al.  Non-linear dynamics in multiagent reinforcement learning algorithms , 2008, AAMAS.

[5]  Paloma Martínez,et al.  Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning , 2009, Applied Intelligence.

[6]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[7]  Ryszard Kowalczyk,et al.  Learning the IPA market with individual and social rewards , 2009, Web Intell. Agent Syst..

[8]  Edmund H. Durfee,et al.  Predicting the Expected Behavior of Agents that Learn About Agents: The CLRI Framework , 2004, Autonomous Agents and Multi-Agent Systems.

[9]  Majid Nili Ahmadabadi,et al.  Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition , 2009, International Journal of Computer Vision.

[10]  Dan Ventura,et al.  Predicting and Preventing Coordination Problems in Cooperative Q-learning Systems , 2007, IJCAI.

[11]  Frederick Mosteller,et al.  Stochastic Models for Learning , 1956 .

[12]  Kristina Lerman,et al.  Resource allocation in the grid using reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[13]  A.G. Bakirtzis,et al.  A Reinforcement Learning Algorithm for Market Participants in FTR Auctions , 2007, 2007 IEEE Lausanne Power Tech.

[14]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[15]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[17]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[18]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[19]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .