论文信息 - An Adjusted Payoff-Based Procedure for Normal Form Games

An Adjusted Payoff-Based Procedure for Normal Form Games

We study a simple adaptive model in the framework of an N -player normal form game. The model consists of a repeated game where the players only know their own action space and their own payoff scored at each stage, not those of the other agents. Each player, in order to update her mixed action, computes the average vector payoff she has obtained by using the number of times she has played each pure action. The resulting stochastic process is analyzed via the ODE method from stochastic approximation theory. We are interested in the convergence of the process to rest points of the related continuous dynamics. Results concerning almost sure convergence and convergence with positive probability are obtained and applied to a traffic game. We also provide some examples where convergence occurs with probability zero.

Mario Bravo

[1] Stephen S. Wilson,et al. Random iterative models , 1996 .

[2] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[3] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[4] Josef Hofbauer,et al. Learning in perturbed asymmetric games , 2005, Games Econ. Behav..

[5] Alvin E. Roth,et al. Modelling Predicting How People Play Games: Reinforcement learning in experimental games with unique , 1998 .

[6] L. Blume. The Statistical Mechanics of Strategic Interaction , 1993 .

[7] David S. Leslie,et al. Individual Q-Learning in Normal Form Games , 2005, SIAM J. Control. Optim..

[8] M. Benaïm. Vertex-reinforced random walks and a conjecture of Pemantle , 1997 .

[9] Alan W. Beggs,et al. On the convergence of reinforcement learning , 2005, J. Econ. Theory.

[10] L. Shapley,et al. REGULAR ARTICLEPotential Games , 1996 .

[11] Tilman Börgers,et al. Learning Through Reinforcement and Replicator Dynamics , 1997 .

[12] William H. Sandholm,et al. Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.

[13] A. Roth,et al. Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[14] O. Brandière,et al. Les algorithmes stochastiques contournent-ils les pièges? , 1995 .

[15] Martin Posch,et al. Cycling in a stochastic learning algorithm for normal form games , 1997 .

[16] Nick Netzer,et al. The logit-response dynamics , 2010, Games Econ. Behav..

[17] L. Shapley,et al. Potential Games , 1994 .

[18] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .

[19] Jason R. Marden,et al. Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20] C. Conley. Isolated Invariant Sets and the Morse Index , 1978 .

[21] Han-Fu Chen. Stochastic approximation and its applications , 2002 .

[22] Roberto Cominetti,et al. Author's Personal Copy Games and Economic Behavior a Payoff-based Learning Procedure and Its Application to Traffic Games , 2022 .

[23] O. H. Brownlee,et al. ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[24] M. Benaïm. Dynamics of stochastic approximation algorithms , 1999 .

[25] H. Robbins. A Stochastic Approximation Method , 1951 .

[26] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[27] S. Hart,et al. A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[28] Aarnout Brombacher,et al. Probability... , 2009, Qual. Reliab. Eng. Int..

[29] J. Milnor. Topology from the differentiable viewpoint , 1965 .

[30] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[31] Jean-François Laslier,et al. A Behavioral Learning Process in Games , 2001, Games Econ. Behav..

[32] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[33] Sebastian J. Schreiber,et al. Urn Models, Replicator Processes, and Random Genetic Drift , 2001, SIAM J. Appl. Math..

[34] M. Benaim,et al. VERTEX-REINFORCED RANDOM WALKS AND A CONJECTURE OF PEMANTLE , 2002 .