Learning To Cooperate in a Social Dilemma: A Satisficing Approach to Bargaining

Learning in many multi-agent settings is inherently repeated play. This calls into question the naive application of single play Nash equilibria in multi-agent learning and suggests, instead, the application of give-and-take principles of bargaining. We modify and analyze a satisficing algorithm based on (Karandikar et al., 1998) that is compatible with the bargaining perspective. This algorithm is a form of relaxation search that converges to a satisficing equilibrium without knowledge of game payoffs or other agents' actions. We then develop an M action, N player social dilemma that encodes the key elements of the Prisoner's Dilemma. This game is instructive because it characterizes social dilemmas with more than two agents and more than two choices. We show how several different multi-agent learning algorithms behave in this social dilemma, and demonstrate that the satisficing algorithm converges, with high probability, to a Pareto efficient solution in self play and to the single play Nash equilibrium against selfish agents. Finally, we present theoretical results that characterize the behavior of the algorithm.

[1]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[2]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[3]  Sandip Sen,et al.  Evaluating concurrent reinforcement learners , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[4]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[5]  M. Veloso,et al.  Rational Learning of Mixed Equilibria in Stochastic Games , 2000 .

[6]  Henry Hamburger,et al.  N‐person Prisoner's Dilemma† , 1973 .

[7]  Nick Feltovich,et al.  Reinforcement-based vs. Belief-based Learning Models in Experimental Asymmetric-information Games , 2000 .

[8]  M. Goodrich,et al.  Neglect Tolerant Teaming: Issues and Dilemmas , 2003 .

[9]  Howard Raiffa,et al.  Games And Decisions , 1958 .

[10]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[11]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[12]  Michael A. Goodrich,et al.  Satisficing and Learning Cooperation in the Prisoner s Dilemma , 2001, IJCAI.

[13]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[14]  Debraj Ray,et al.  Evolving Aspirations and Cooperation , 1998 .

[15]  D B Fogel,et al.  Special issue on the prisoner's dilemma. , 1996, Bio Systems.

[16]  Jeffrey S. Rosenschein,et al.  Time and the Prisoner's Dilemma , 2007, ICMAS.

[17]  Norman Frohlich,et al.  When Is Universal Contribution Best for the Group? , 1996 .

[18]  Dilip Mookherjee,et al.  Institutional Structure and the Logic of Ongoing Collective Action , 1987, American Political Science Review.

[19]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[20]  Wynn C. Stirling,et al.  Satisficing Equilibria: A Non-Classical Theory of Games and Decisions , 2002, Autonomous Agents and Multi-Agent Systems.

[21]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[22]  Michael P. Wellman,et al.  Experimental Results on Q-Learning for General-Sum Stochastic Games , 2000, ICML.