FAQ-Learning in Matrix Games: Demonstrating Convergence Near Nash Equilibria, and Bifurcation of Attractors in the Battle of Sexes

This article studies Frequency Adjusted Q-learning (FAQ-learning), a variation of Q-learning that simulates simultaneous value function updates. The main contributions are empirical and theoretical support for the convergence of FAQ-learning to attractors near Nash equilibria in two-agent two-action matrix games. The games can be divided into three types: Matching pennies, Prisoners' Dilemma and Battle of Sexes. This article shows that the Matching pennies and Prisoners' Dilemma yield one attractor of the learning dynamics, while the Battle of Sexes exhibits a supercritical pitchfork bifurcation at a critical temperature of τ, where one attractor splits into two attractors and one repellent fixed point. Experiments illustrate that the distance between fixed points of the FAQ-learning dynamics and Nash equilibria tends to zero as the exploration parameter τ of FAQ-learning approaches zero.

[1]  Karl Tuyls,et al.  Frequency adjusted multi-agent Q-learning , 2010, AAMAS.

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[6]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[7]  Ryszard Kowalczyk,et al.  Modelling the dynamics of multiagent Q-learning with ε-greedy exploration , 2009, AAMAS.

[8]  D. E. Matthews Evolution and the Theory of Games , 1977 .

[9]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[10]  P. Taylor,et al.  Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[11]  Simon Parsons,et al.  What evolutionary game theory tells us about multiagent learning , 2007, Artif. Intell..

[12]  Simon Parsons,et al.  An evolutionary model of multi-agent learning with a varying exploration rate , 2009, AAMAS.

[13]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[14]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[15]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  R. Lewontin Evolution and the theory of games. , 1961, Journal of theoretical biology.

[18]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[19]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[20]  Karl Tuyls,et al.  Empirical and theoretical support for lenient learning , 2011, AAMAS.

[21]  M. Hirsch,et al.  Differential Equations, Dynamical Systems, and an Introduction to Chaos , 2003 .

[22]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[23]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.