论文信息 - Multiple reward criterion for cooperative behavior acquisition in a multiagent environment

Multiple reward criterion for cooperative behavior acquisition in a multiagent environment

A vector-valued reward function is discussed in the context of multiple behavior coordination, especially in a dynamically changing multiagent environment. Unlike the traditional weighted sum of several reward functions, we define a vector-valued value function which evaluates the current action strategy by introducing a discounted matrix to integrate several reward functions. Owing to the extension of the value function, the learning robot can estimate the future multiple reward from the environment appropriately not suffering from the weighting problem. The proposed method is applied to a simplified soccer game. Computer simulations are shown and a discussion is given.

E. Uchibe | M. Asada

[1] Jonas Karlsson,et al. Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .

[2] M. Matarić. Learning to Behave Socially , 1994 .

[3] Manuela M. Veloso,et al. Team-Partitioned, Opaque-Transition Reinforced Learning , 1998, RoboCup.

[4] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[5] Minoru Asada,et al. Behavior coordination for a mobile robot using modular reinforcement learning , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.