Designing Learning Algorithms over the Sequence Form of an Extensive-Form Game

We focus on multi-agent learning over extensive-form games. When designing algorithms for extensive-form games, it is common the resort to tabular representations (i.e., normal form, agent form, and sequence form). Each representation provides some advantages and suffers from some drawbacks and it is not known which representation, if any, is the best one in multi-agent learning. In particular, a wide literature studies algorithms for the normal form, but this representation is prohibitive in practice since it is exponentially large in the size of the game tree. In this paper, we show that some learning algorithms defined over the normal form can be re-defined over the sequence form so that the dynamics of the two algorithms are realization equivalent (i.e., they induce the same probability distribution over the outcomes). This allows an exponential compression of the representation and therefore makes such algorithms employable in practice.

[2]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[3]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[4]  Drew Fudenberg,et al.  Game theory (3. pr.) , 1991 .

[5]  J. Neumann,et al.  Theory of Games and Economic Behavior. , 1945 .

[6]  Michael H. Bowling,et al.  No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[7]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[8]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[9]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[10]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[11]  Duane Szafron,et al.  Using counterfactual regret minimization to create competitive multiplayer poker agents , 2010, AAMAS 2010.

[12]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[13]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[14]  Marcello Restelli,et al.  Efficient Evolutionary Dynamics with Extensive-Form Games , 2013, AAAI.

[15]  Dries Vermeulen,et al.  The reduced form of a game , 1998, Eur. J. Oper. Res..

[16]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[17]  Karl Tuyls,et al.  A common gradient in multi-agent reinforcement learning , 2012, AAMAS.

[18]  Richard Gibson,et al.  Regret Minimization in Non-Zero-Sum Games with Applications to Building Champion Multiplayer Computer Poker Agents , 2013, ArXiv.

[19]  Marcello Restelli,et al.  Evolutionary Dynamics of Q-Learning over the Sequence Form , 2014, AAAI.

[20]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[21]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.