Simple reinforcement learning agents: Pareto beats Nash in an algorithmic game theory study

Abstract.Repeated play in games by simple adaptive agents is investigated. The agents use Q-learning, a special form of reinforcement learning, to direct learning of behavioral strategies in a number of 2×2 games. The agents are able effectively to maximize the total wealth extracted. This often leads to Pareto optimal outcomes. When the rewards signals are sufficiently clear, Pareto optimal outcomes will largely be achieved. The effect can select Pareto outcomes that are not Nash equilibria and it can select Pareto optimal outcomes among Nash equilibria.

[1]  J. N. Bearden The evolution of inefficiency in a simulated stag hunt , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[2]  Howard Raiffa,et al.  Games And Decisions , 1958 .

[3]  Jonathan Bendor,et al.  Reinforcement Learning in Repeated Interaction Games , 2001 .

[4]  Ki Hang Kim Game theory in the social sciences , 1986 .

[5]  Garett O. Dworman,et al.  On Automated Discovery of Models Using Genetic Programming: Bargaining in a Three-Agent Coalitions Game , 1995, J. Manag. Inf. Syst..

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  M. M. Flood Some Experimental Games , 1958 .

[8]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[9]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[10]  Tilman Börgers,et al.  Naive Reinforcement Learning With Endogenous Aspirations , 2000 .

[11]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[12]  Anatol Rapoport,et al.  The 2x2 Game , 1976 .

[13]  M. Machina Dynamic Consistency and Non-expected Utility Models of Choice under Uncertainty , 1989 .

[14]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[15]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[16]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[17]  Colin Camerer Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Sandip Sen,et al.  Towards a pareto-optimal solution in general-sum games , 2003, AAMAS '03.

[20]  A. Burgos Learning to deal with risk: what does reinforcement learning tell us about risk atittudes? , 1999 .

[21]  Steven O. Kimbrough,et al.  Bargaining by artificial agents in two coalition games: a study in genetic programming for electronic commerce , 1996 .

[22]  Joshua M. Epstein,et al.  Growing Artificial Societies: Social Science from the Bottom Up , 1996 .

[23]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..