Multi-agent learning in extensive games with complete information

Learning in a multi-agent system is difficult because the learning environment jointly created by all learning agents is time-variant. This paper studies the model of multi-agent learning in complete-information extensive games (CEGs). We provide two provably convergent algorithms for this model. Both algorithms utilize the special structure of CEGs and guarantee both individual and collective convergence. Our work contributes to the multi-agent learning literature in several aspects: 1. We identify a model of multi-agent learning, namely, learning in CEGs, and provide two provably convergent algorithms for this model. 2. We explicitly address the environment-shifting problem and show that how patient agents can collectively learn to play equilibrium strategies. 3. Many game-theoretical work on learning uses a technique called fictitious play, which requires agents to build beliefs about their opponents. For our model of learning in CEGs, we show it is true that agents can collectively converge to the sub-game perfect equilibrium (SPE) by repeatedly reinforcing their previous success/failure experience; no belief building is necessary.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  Larry Samuelson,et al.  An Evolutionary Analysis of Backward and Forward Induction , 1993 .

[5]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[8]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[10]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[11]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[12]  Richard Wheeler,et al.  Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.

[13]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[14]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[15]  Dov Samet,et al.  Learning to play games in extensive form by valuation , 2001, J. Econ. Theory.