The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems

Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multi agent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Q-leaming in cooperative multi agent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium.

[1]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[2]  David Lewis Convention: A Philosophical Study , 1986 .

[3]  W. Hamilton,et al.  The Evolution of Cooperation , 1984 .

[4]  John C. Harsanyi,et al.  Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .

[5]  Roger B. Myerson,et al.  Game theory - Analysis of Conflict , 1991 .

[6]  David M. Kreps,et al.  Lectures on learning and equilibrium in strategic form games , 1992 .

[7]  Moshe Tennenholtz,et al.  On the Synthesis of Useful Social Laws for Artificial Agent Societies (Preliminary Report) , 1992, AAAI.

[8]  Moshe Tennenholtz,et al.  Emergent Conventions in Multi-Agent Systems: Initial Experimental Results and Observations (Preliminary Report) , 1992, KR.

[9]  H. Young,et al.  The Evolution of Conventions , 1993 .

[10]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[11]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[12]  R. Rob,et al.  Learning, Mutation, and Long Run Equilibria in Games , 1993 .

[13]  Holly A. Yanco,et al.  An adaptive communication protocol for cooperating mobile robots , 1993 .

[14]  D. Fudenberg,et al.  Steady state learning and Nash equilibrium , 1993 .

[15]  Gerhard Weiss,et al.  Learning to Coordinate Actions in Multi-Agent-Systems , 1993, IJCAI.

[16]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[17]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[18]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[19]  Sandip Sen,et al.  Multiagent Coordination with Learning Classifier Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[20]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[21]  L. Shapley,et al.  Fictitious Play Property for Games with Identical Interests , 1996 .

[22]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[23]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[24]  Junling Hu,et al.  Self-fulfilling Bias in Multiagent Learning , 1996 .

[25]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[26]  Craig Boutilier,et al.  Learning Conventions in Multiagent Stochastic Domains using Likelihood Estimates , 1996, UAI.

[27]  V. Borkar Asynchronous Stochastic Approximations , 1998 .