Frequency adjusted multi-agent Q-learning

Multi-agent learning is a crucial method to control or find solutions for systems, in which more than one entity needs to be adaptive. In today's interconnected world, such systems are ubiquitous in many domains, including auctions in economics, swarm robotics in computer science, and politics in social sciences. Multi-agent learning is inherently more complex than single-agent learning and has a relatively thin theoretical framework supporting it. Recently, multi-agent learning dynamics have been linked to evolutionary game theory, allowing the interpretation of learning as an evolution of competing policies in the mind of the learning agents. The dynamical system from evolutionary game theory that has been linked to Q-learning predicts the expected behavior of the learning agents. Closer analysis however allows for two interesting observations: the predicted behavior is not always the same as the actual behavior, and in case of deviation, the predicted behavior is more desirable. This discrepancy is elucidated in this article, and based on these new insights Frequency Adjusted Q- (FAQ-) learning is proposed. This variation of Q-learning perfectly adheres to the predictions of the evolutionary model for an arbitrarily large part of the policy space. In addition to the theoretical discussion, experiments in the three classes of two-agent two-action games illustrate the superiority of FAQ-learning.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[4]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[5]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[6]  M. Hirsch,et al.  Differential Equations, Dynamical Systems, and an Introduction to Chaos , 2003 .

[7]  Bruce Bueno de Mesquita,et al.  Game Theory, Political Economy, and the Evolving Study of War and Peace , 2006, American Political Science Review.

[8]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[9]  D. E. Matthews Evolution and the Theory of Games , 1977 .

[10]  Simon Parsons,et al.  A novel method for automatic strategy acquisition in N-player non-zero-sum games , 2006, AAMAS '06.

[11]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[12]  Ryszard Kowalczyk,et al.  Modelling the dynamics of multiagent Q-learning with ε-greedy exploration , 2009, AAMAS.

[13]  Simon Parsons,et al.  What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[14]  Simon Parsons,et al.  An evolutionary model of multi-agent learning with a varying exploration rate , 2009, AAMAS.

[15]  M. Littman,et al.  Q-learning in Two-Player Two-Action Games , 2009 .

[16]  P. Taylor,et al.  Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[17]  Marco Dorigo,et al.  Teamwork in Self-Organized Robot Colonies , 2009, IEEE Transactions on Evolutionary Computation.

[18]  A. Hama Predictably Irrational: The Hidden Forces That Shape Our Decisions , 2010 .

[19]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[20]  Herbert Gintis,et al.  Game Theory Evolving: A Problem-Centered Introduction to Modeling Strategic Interaction - Second Edition , 2009 .

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[23]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[24]  Dov Monderer,et al.  A Learning Approach to Auctions , 1998 .