Convergent multiple-timescales reinforcement learning algorithms in normal form games

We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and N -player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge--in particular, we consider the N -player matching pennies game and Shapley's variant of the rock--scissors--paper game. By extending stochastic approximation results to multiple timescales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.

[1]  L. Shapley SOME TOPICS IN TWO-PERSON GAMES , 1963 .

[2]  S. Vajda Some topics in two-person games , 1971 .

[3]  J. Harsanyi Games with randomly disturbed payoffs: A new rationale for mixed-strategy equilibrium points , 1973 .

[4]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[5]  V. Nollau Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 , 1980 .

[6]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[8]  L. Baird,et al.  A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING , 1990 .

[9]  R. Pemantle,et al.  Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[10]  J. Jordan Three Problems in Learning Mixed-Strategy Nash Equilibria , 1993 .

[11]  David M. Kreps,et al.  Learning Mixed Equilibria , 1993 .

[12]  Christopher Jones,et al.  Geometric singular perturbation theory , 1995 .

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  E. J. Collins,et al.  A general technique for computing evolutionarily stable strategies based on errors in decision-making. , 1997, Journal of theoretical biology.

[15]  V. Borkar Stochastic approximation with two time scales , 1997 .

[16]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[17]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[18]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[19]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[20]  E. Hopkins A Note on Best Response Dynamics , 1999 .

[21]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[22]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[23]  M. Benaïm Dynamics of stochastic approximation algorithms , 1999 .

[24]  M. Hirsch,et al.  Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games , 1999 .

[25]  Eric van Damme,et al.  Non-Cooperative Games , 2000 .

[26]  Peter Stone,et al.  Implicit Negotiation in Repeated Games , 2001, ATAL.

[27]  Peter Stone,et al.  Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.

[28]  Vivek S. Borkar,et al.  Reinforcement Learning in Markovian Evolutionary Games , 2002, Adv. Complex Syst..

[29]  Josef Hofbauer,et al.  Learning in perturbed asymmetric games , 2005, Games Econ. Behav..

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.