论文信息 - Optimal Reward Functions in Distributed Reinforcement Learning

Optimal Reward Functions in Distributed Reinforcement Learning

We consider the design of multi-agent systems so as to optimize an overall world utility function when (1) those systems lack centralized communication and control, and (2) each agents runs a distinct Reinforcement Learning (RL) algorithm. A crucial issue in such design problems is to initialize/update each agent's private utility function, so as to induce best possible world utility. Traditional 'team game' solutions to this problem sidestep this issue and simply assign to each agent the world utility as its private utility function. In previous work we used the 'Collective Intelligence' framework to derive a better choice of private utility functions, one that results in world utility performance up to orders of magnitude superior to that ensuing from use of the team game utility. In this paper we extend these results. We derive the general class of private utility functions that both are easy for the individual agents to learn and that, if learned well, result in high world utility. We demonstrate experimentally that using these new utility functions can result in significantly improved performance over that of our previously proposed utility, over and above that previous utility's superiority to the conventional team game utility.

Kagan Tumer | David H. Wolpert | D. Wolpert | Kagan Tumer

[1] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[2] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[3] Gerhard Weiss,et al. Multiagent Systems , 1999 .

[4] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[5] Y. Shoham,et al. Editorial: economic principles of multi-agent systems , 1997 .

[6] Kagan Tumer,et al. An Introduction to Collective Intelligence , 1999, ArXiv.

[7] Kagan Tumer,et al. Collective Intelligence for Control of Distributed Dynamical Systems , 1999, ArXiv.

[8] Onn Shehory,et al. Anytime Coalition Structure Generation with Worst Case Guarantees , 1998, AAAI/IAAI.

[9] Kagan Tumer,et al. Collective Intelligence and Braess' Paradox , 2000, AAAI/IAAI.

[10] Nicholas R. Jennings,et al. A Roadmap of Agent Research and Development , 2004, Autonomous Agents and Multi-Agent Systems.

[11] Yicheng Zhang,et al. On the minority game: Analytical and numerical studies , 1998, cond-mat/9805084.

[12] Kagan Tumer,et al. Using Collective Intelligence to Route Internet Traffic , 1998, NIPS.

[13] Michael R. Genesereth,et al. Software agents , 1994, CACM.

[14] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[15] G. Hardin,et al. The Tragedy of the Commons , 1968, Green Planet Blues.