Bush‐Mosteller learning for a zero-sum repeated game with random pay-offs

This paper deals with the design and analysis of a modified version of the Bush-Mosteller reinforcement scheme applied by partners in a zero-sum repeated game with random pay-offs. The suggested study is based on the learning automata paradigm and a limiting average reward criterion is tackled to analyse the arising Nash equilibrium. No information concerning the distribution of the pay-off is a priori available. The novelty of the suggested adaptive strategy is related to the incorporation of a 'normalization procedure' into the standard Bush-Mosteller scheme to provide a possibility to operate not only with binary but also with any bounded rewards of a stochastic nature. The analysis of the convergence (adaptation) as well as the convergence rate (rate of adaptation) are presented and the optimal design parameters of this adaptive procedure are derived. The obtained adaptation rate turns out to be of o(n 1/3 ).

[1]  V. V. Phansalkar,et al.  Decentralized Learning of Nash Equilibria in Multi-Person Stochastic Games With Incomplete Information , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[2]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[3]  Shmuel Zamir,et al.  Repeated games of incomplete information: Zero-sum , 1992 .

[4]  Daniel Friedman,et al.  Evolutionary economics goes mainstream: A review of the theory of learning in games , 1998 .

[5]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[6]  J. Cross A theory of adaptive economic behavior , 1983 .

[7]  Elon Kohlberg,et al.  Optimal strategies in repeated games with incomplete information , 1975 .

[8]  Kaddour Najim,et al.  Learning Automata: Theory and Applications , 1994 .

[9]  Sylvain Sorin,et al.  Repeated Games. Part A: Background Material , 1994 .

[10]  Kaddour Najim,et al.  Learning automata and stochastic optimization , 1997 .

[11]  K. Narendra,et al.  Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information: A Unified Approach , 1982 .

[12]  R. Gardner Games for business and economics , 1994 .

[13]  F. Forges Repeated games of incomplete information: Non-zero-sum , 1992 .

[14]  J. Aubin Mathematical methods of game and economic theory , 1979 .

[15]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[16]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[17]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[18]  Drew Fudenberg,et al.  Repeated Games with Long-run and Short-run Players , 1990 .

[19]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[20]  Alexander S. Poznyak,et al.  Self-Learning Control of Finite Markov Chains , 2000 .