Learning sequences of actions in collectives of autonomous agents

In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. Directly applying Reinforcement Learning (RL) concepts to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in designing multi-agent systems focuses on how to set the rewards for the RL algorithm of each agent so that as the agents attempt to maximize those rewards, the system reaches a globally "desirable" solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence [15,23] to design rewards for the agents that are "aligned" with the global reward, and are "learnable" in that agents can readily see how their behavior affects their reward. We show that reinforcement learning agents using those rewards outperform both "natural" extensions of single agent algorithms and global reinforcement learning solutions based on "team games".

[1]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[2]  Kagan Tumer,et al.  Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[3]  Kagan Tumer,et al.  Adaptivity in agent-based routing for data networks , 1999, AGENTS '00.

[4]  Kagan Tumer,et al.  An Introduction to Collective Intelligence , 1999, ArXiv.

[5]  Scott Shenker,et al.  Learning in Network Contexts: Experimental Results from Simulations , 2001, Games Econ. Behav..

[6]  L. Shapley,et al.  Potential Games , 1994 .

[7]  Craig Boutilier Multiagent Systems: Challenges and Opportunities for Decision-Theoretic Planning , 1999, AI Mag..

[8]  J. Davenport Editor , 1960 .

[9]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[10]  Kagan Tumer,et al.  Improving Simulated Annealing by Recasting it as a Non-Cooperative Game , 2001 .

[11]  Kagan Tumer,et al.  Collective Intelligence and Braess' Paradox , 2000, AAAI/IAAI.

[12]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[13]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  A. Mas-Colell,et al.  Microeconomic Theory , 1995 .

[16]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[17]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[18]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[19]  Theodore Groves,et al.  Incentives in Teams , 1973 .

[20]  Nicholas R. Jennings,et al.  A Roadmap of Agent Research and Development , 2004, Autonomous Agents and Multi-Agent Systems.

[21]  Kagan Tumer,et al.  Collective Intelligence for Control of Distributed Dynamical Systems , 1999, ArXiv.

[22]  William Vickrey,et al.  Counterspeculation, Auctions, And Competitive Sealed Tenders , 1961 .

[23]  Michael P. Wellman A Market-Oriented Programming Environment and its Application to Distributed Multicommodity Flow Problems , 1993, J. Artif. Intell. Res..

[24]  G. Hardin,et al.  The Tragedy of the Commons , 1968, Green Planet Blues.

[25]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[26]  Kagan Tumer,et al.  Using Collective Intelligence to Route Internet Traffic , 1998, NIPS.