Multiple Reward Criterion for Cooperative Behavior Acquisition in a Muliagent Environment

A vector-valued reward function is discussed in the context of multiple behavior coordination, especially in a dynamically changing multiagent environment. Unlike the traditional weighted sum of several reward functions, we define a vector-valued value function which evaluates the current action strategy by introducing a discounted matrix to integrate several reward functions. Owing to the extension of the value function, the learning robot can estimate the future multiple reward from the environment appropriately not suffering from the weighting problem. The proposed method is applied to a simplified soccer game. Computer simulations are shown and a discussion is given.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Jonas Karlsson,et al.  Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .

[3]  Takashi Matsuyama,et al.  Cooperative Distributed Vision: Dynamic Integration of Visual Perception, Action, and Communication , 1999, KI.

[4]  Maja J. Matarić,et al.  Leaning to behave socially , 1994 .

[5]  Minoru Asada,et al.  Action-Based Sensor Space Segmentation for Soccer Robot Learning , 1998, Appl. Artif. Intell..

[6]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[7]  Wallace E. Larimore,et al.  Canonical variate analysis in identification, filtering, and adaptive control , 1990, 29th IEEE Conference on Decision and Control.

[8]  Csaba Szepesvári,et al.  Multi-criteria Reinforcement Learning , 1998, ICML.

[9]  Minoru Asada,et al.  State space construction for behavior acquisition in multi agent environments with vision and action , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[10]  Hiroaki Kitano,et al.  RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[11]  Minoru Asada,et al.  Behavior coordination for a mobile robot using modular reinforcement learning , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[12]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[13]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[14]  Minoru Asada,et al.  Continuous valued Q-learning for vision-guided behavior acquisition , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[15]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[16]  Manuela M. Veloso,et al.  Team-Partitioned, Opaque-Transition Reinforced Learning , 1998, RoboCup.