Evolutionary Dynamics of Q-Learning over the Sequence Form

Multi-agent learning is a challenging open task in artificial intelligence. It is known an interesting connection between multi-agent learning algorithms and evolutionary game theory, showing that the learning dynamics of some algorithms can be modeled as replicator dynamics with a mutation term. Inspired by the recent sequence-form replicator dynamics, we develop a new version of the Q-learning algorithm working on the sequence form of an extensive-form game allowing thus an exponential reduction of the dynamics length w.r.t. those of the normal form. The dynamics of the proposed algorithm can be modeled by using the sequence-form replicator dynamics with a mutation term. We show that, although sequence-form and normal-form replicator dynamics are realization equivalent, the Q-learning algorithm applied to the two forms have nonrealization equivalent dynamics. Originally from the previous works on evolutionary game theory models form multi-agent learning, we produce an experimental evaluation to show the accuracy of the model.

[1]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[2]  Michael H. Bowling,et al.  No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[3]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[4]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[5]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Karl Tuyls,et al.  A common gradient in multi-agent reinforcement learning , 2012, AAMAS.

[8]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Marc Lanctot,et al.  Further developments of extensive-form replicator dynamics using the sequence-form representation , 2014, AAMAS.

[12]  Duane Szafron,et al.  Using counterfactual regret minimization to create competitive multiplayer poker agents , 2010, AAMAS 2010.

[13]  Peter Vrancx,et al.  Switching dynamics of multi-agent learning , 2008, AAMAS.

[14]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[15]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[16]  Richard Gibson,et al.  Regret Minimization in Non-Zero-Sum Games with Applications to Building Champion Multiplayer Computer Poker Agents , 2013, ArXiv.

[17]  Marcello Restelli,et al.  Efficient Evolutionary Dynamics with Extensive-Form Games , 2013, AAAI.

[18]  Kevin Leyton-Brown,et al.  Empirically Evaluating Multiagent Learning Algorithms , 2014, ArXiv.

[19]  Duane Szafron,et al.  Using counterfactual regret minimization to create competitive multiplayer poker agents , 2010, AAMAS.

[20]  Drew Fudenberg,et al.  Game theory (3. pr.) , 1991 .