Cooperative strategy based on adaptive Q-learning for robot soccer systems

The objective of this paper is to develop a self-learning cooperative strategy for robot soccer systems. The strategy enables robots to cooperate and coordinate with each other to achieve the objectives of offense and defense. Through the mechanism of learning, the robots can learn from experiences in either successes or failures, and utilize these experiences to improve the performance gradually. The cooperative strategy is built using a hierarchical architecture. The first layer of the structure is responsible for assigning each role, that is, how many defenders and sidekicks should be played according to the positional states. The second layer is for the role assignment related to the decision from the previous layer. We develop two algorithms for assignment of the roles, the attacker, the defenders, and the sidekicks. The last layer is the behavior layer in which robots execute their behavior commands and tasks based on their roles. The attacker is responsible for chasing the ball and attacking. The sidekicks are responsible for finding good positions, and the defenders are responsible for defending competitor scoring. The robots' roles are not fixed. They can dynamically exchange their roles with each other. In the aspect of learning, we develop an adaptive Q-learning method which is modified form the traditional Q-learning. A simple ant experiment shows that Q-learning is more effective than the traditional techniques, and it is also successfully applied to the learning of the cooperative strategy.

[1]  Jong-Hwan Kim,et al.  Robot Soccer System of SOTY 5 for Middle League MiroSot , 2002 .

[2]  Gourab Sen Gupta,et al.  Strategy for collaboration in robot soccer , 2002, Proceedings First IEEE International Workshop on Electronic Design, Test and Applications '2002.

[3]  Huosheng Hu,et al.  Reinforcement learning and co-operation in a simulated multi-agent system , 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289).

[4]  Vijay Kumar,et al.  Dynamic role assignment for cooperative robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[5]  Yoichiro Maeda Modified Q-learning method with fuzzy state division and adaptive rewards , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[6]  Yishay Mansour,et al.  Convergence of Optimistic and Incremental Q-Learning , 2001, NIPS.