Nash Equilibrium or Nash Bargaining ? Choosing a Solution Concept for Multi-Agent Learning

Learning in many multi-agent settings is inherently repeated play. This calls into question the naive application of Nash equilibria in multi-agent learning and suggests, instead, the application of give-and-take principles of bargaining. We present an M action, N player social dilemma that encodes the key elements of the Prisoner’s Dilemma and thereby serves to highlight the importance of cooperation in multiagent systems. This game is instructive because it characterizes social dilemmas with more than two agents and more than two choices. We show how several different multi-agent learning algorithms behave in this social dilemma, including a satisficing algorithm based on [16] that is compatible with the bargaining perspective. This algorithm is a form of relaxation search that converges to a satisficing equilibrium without knowledge of other agents actions and payoffs. Finally, we present theoretical results that characterize the behavior of the algorithm.

[1]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[2]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[3]  Leslie Pack Kaelbling,et al.  Playing is believing: The role of beliefs in multi-agent learning , 2001, NIPS.

[4]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[5]  Peter Stone,et al.  Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.

[6]  Nick Feltovich,et al.  Reinforcement-based vs. Belief-based Learning Models in Experimental Asymmetric-information Games , 2000 .

[7]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[8]  Jeffrey S. Rosenschein,et al.  Time and the Prisoner's Dilemma , 2007, ICMAS.

[9]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[10]  Norman Frohlich,et al.  When Is Universal Contribution Best for the Group? , 1996 .

[11]  Debraj Ray,et al.  Evolving Aspirations and Cooperation , 1998 .

[12]  Michael A. Goodrich,et al.  Satisficing and Learning Cooperation in the Prisoner s Dilemma , 2001, IJCAI.

[13]  M. Goodrich,et al.  Neglect Tolerant Teaming: Issues and Dilemmas , 2003 .

[14]  H. Kuhn Classics in Game Theory , 1997 .

[15]  Dilip Mookherjee,et al.  Institutional Structure and the Logic of Ongoing Collective Action , 1987, American Political Science Review.

[16]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[17]  M. Veloso,et al.  Rational Learning of Mixed Equilibria in Stochastic Games , 2000 .

[18]  Henry Hamburger,et al.  N‐person Prisoner's Dilemma† , 1973 .

[19]  Michael P. Wellman,et al.  Experimental Results on Q-Learning for General-Sum Stochastic Games , 2000, ICML.

[20]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[21]  Peter Stone,et al.  A polynomial-time nash equilibrium algorithm for repeated games , 2003, EC '03.

[22]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[23]  Sandip Sen,et al.  Evaluating concurrent reinforcement learners , 2000, Proceedings Fourth International Conference on MultiAgent Systems.