论文信息 - Experimental Analysis of Reward Design for Continuing Task in Multiagent Domains

Experimental Analysis of Reward Design for Continuing Task in Multiagent Domains

In this paper, we discuss guidelines for a reward design problem that defines when and what amount of reward should be given to the agent/s, within the context of reinforcement learning approach. We would like to take keepaway soccer as a standard task of the multiagent domain which requires skilled teamwork. The difficulties of designing reward for this task are due to its features as follows: i) since it belongs to the continuing task which has no explicit goal to achieve, it is hard to tell when reward should be given to the agent/s. ii) since it is a multiagent cooperative task, it is hard to decide what is a fair share of reward for each agent's contribution to achieve the goal. Through some experiments, we show that the reward design have a major effect on the agent's behavior, and introduce the successful reward function that makes agents perform keepaway better and more interesting than the conventional one does. Finally, we explore the relationship between `reward design' and `acquired behaviors' from the viewpoint of teamwork.

Sachiyo Arai | Nobuyuki Tanaka

[1] Jürgen Schmidhuber,et al. Gradient-based Reinforcement Planning in Policy-Search Methods , 2001, ArXiv.

[2] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[3] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.

[4] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[5] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[6] Peter Stone,et al. Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[7] Ian Frank,et al. Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[8] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[9] John J. Grefenstette,et al. Credit assignment in rule discovery systems based on genetic algorithms , 1988, Machine Learning.

[10] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[11] Larry M. Stephens,et al. Agent Organization as an Effector of DAI System Performance , 1989 .

[12] Peter Stone,et al. Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.