The dynamics of generalized reinforcement learning

We consider reinforcement learning in games with both positive and negative payoffs. The Cross rule is the prototypical reinforcement learning rule in games that have only positive payoffs. We extend this rule to incorporate negative payoffs to obtain the generalized reinforcement learning rule. Applying this rule to a population game, we obtain the generalized reinforcement dynamic which describes the evolution of mixed strategies in the population. We apply the dynamic to the class of Rock–Scissor–Paper (RSP) games to establish local convergence to the interior rest point in all such games, including the bad RSP game.

[1]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[2]  H. Peyton Young,et al.  Strategic Learning and Its Limits , 2004 .

[3]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[4]  W. Estes Toward a Statistical Theory of Learning. , 1994 .

[5]  Robert M. Seymour,et al.  Reinforcement learning in population games , 2013, Games Econ. Behav..

[6]  R. R. Bush,et al.  A model for stimulus generalization and discrimination. , 1951, Psychological review.

[7]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[8]  R. R. Bush,et al.  A Mathematical Model for Simple Learning , 1951 .

[9]  Drew Fudenberg,et al.  Heterogeneous beliefs and local information in stochastic fictitious play , 2011, Games Econ. Behav..

[10]  W. Estes,et al.  A theory of stimulus variability in learning. , 1953, Psychological review.

[11]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[12]  Tilman Börgers,et al.  Naive Reinforcement Learning With Endogenous Aspirations , 2000 .

[13]  William H. Sandholm,et al.  Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.

[14]  J. Hofbauer,et al.  Fictitious Play, Shapley Polygons and the Replicator Equation , 1995 .

[15]  Ed Hopkins,et al.  The Stability of Price Dispersion Under Seller and Consumer Learning , 1999 .

[16]  Debraj Ray,et al.  Evolving Aspirations and Cooperation , 1998 .

[17]  Ratul Lahkar,et al.  The dynamic instability of dispersed price equilibria , 2011, J. Econ. Theory.