Reinforcement Learning in Repeated Interaction Games

We study long run implications of reinforcement learning when two players repeatedly interact with one another over multiple rounds to play a finite action game. Within each round, the players play the game many successive times with a fixed set of aspirations used to evaluate payoff experiences as successes or failures. The probability weight on successful actions is increased, while failures result in players trying alternative actions in subsequent rounds. The learning rule is supplemented by small amounts of inertia and random perturbations to the states of players. Aspirations are adjusted across successive rounds on the basis of the discrepancy between the average payoff and aspirations in the most recently concluded round. We define and characterize pure steady states of this model, and establish convergence to these under appropriate conditions. Pure steady states are shown to be individually rational, and are either Pareto-efficient or a protected Nash equilibrium of the stage game. Conversely, any Pareto-efficient and strictly individually rational action pair, or any strict protected Nash equilibrium, constitutes a pure steady state, to which the process converges from non-negligible sets of initial aspirations. Applications to games of coordination, cooperation, oligopoly, and electoral competition are discussed.

[1]  Jonathan Bendor,et al.  Aspirations, adaptive learning and cooperation in repeated games , 1994 .

[2]  Dilip Mookherjee,et al.  Learning and Decision Costs in Experimental Constant Sum Games , 1997 .

[3]  F. Vega-Redondo,et al.  Efficient Equilibrium Selection in Evolutionary Games with Random Matching , 1996 .

[4]  Tilman Börgers,et al.  Naive Reinforcement Learning With Endogenous Aspirations , 2000 .

[5]  AMIT PAZGAL Satisficing leads to cooperation in mutual interests games , 1997, Int. J. Game Theory.

[6]  G. Papavassilopoulos Learning algorithms for repeated bimatrix Nash games with incomplete information , 1989 .

[7]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[8]  Huw David Dixon,et al.  Keeping Up With the Joneses: Competition and the Evolution of Collusion in an Oligopolistic Economy , 1998 .

[9]  H. Simon,et al.  A Behavioral Model of Rational Choice , 1955 .

[10]  Dilip Mookherjee,et al.  Learning behavior in an experimental matching pennies game , 1994 .

[11]  M. Shubik,et al.  A Behavioral Theory of the Firm. , 1964 .

[12]  Amit Pazgal Satisficing leads to cooperation in mutual interests games , 1998 .

[13]  K. Parthasarathy,et al.  Probability measures on metric spaces , 1967 .

[14]  S. Winter,et al.  An evolutionary theory of economic change , 1983 .

[15]  I. Gilboa,et al.  Case-Based Decision Theory , 1995 .

[16]  Itzhak Gilboa,et al.  Case-Based Optimization , 1996 .

[17]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[18]  Sanford J. Grossman A characterization of the optimality of equilibrium in incomplete markets , 1977 .

[19]  Jonathan Bendor,et al.  Aspiration-Based Reinforcement Learning in Repeated Games: An Overview , 2001 .

[20]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[21]  H. Simon,et al.  Theories of Decision-Making in Economics and Behavioural Science , 1966 .

[22]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[23]  Araújo,et al.  An Evolutionary theory of economic change , 1983 .

[24]  W. Brian Arthur,et al.  On designing economic agents that behave like human agents , 1993 .

[25]  B. L. S. Prakasa Rao Probability Measures on Metric Spaces. K. R. Parthasarathy, Academic Press, New York, 1967, pp. 276, $12.00. , 1968 .

[26]  Ken Binmore,et al.  Muddling Through: Noisy Equilibrium Selection☆ , 1997 .

[27]  Fernando Vega-Redondo,et al.  Convergence of aspirations and (partial) cooperation in the prisoner's dilemma , 1999, Int. J. Game Theory.

[28]  Debraj Ray,et al.  Evolving Aspirations and Cooperation , 1998 .

[29]  H. Young,et al.  The Evolution of Conventions , 1993 .

[30]  P. Kline Models of man , 1986, Nature.

[31]  Colin Camerer,et al.  Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[32]  Barton L. Lipman How to Decide How to Decide How to. . . : Modeling Limited Rationality , 1991 .

[33]  Klaus Krickeberg,et al.  Markov learning models for multiperson interactions , 1962 .

[34]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[35]  R. Selten Evolution, learning, and economic behavior , 1991 .

[36]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[37]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[38]  R. Selten,et al.  End behavior in sequences of finite prisoner's dilemma supergames , 1986 .

[39]  Kumpati S. Narendra,et al.  The use of learning algorithms in telephone traffic routing - A methodology , 1983, Autom..