论文信息 - Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks

Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks

The RoboCup Simulation League is recognized as a test bed for research on multi-agent learning. As an example of multi-agent learning in a soccer game, we dealt with a learning problem between a kicker and a receiver when a direct free kick is awarded just outside the opponentpsilas penalty area. In such a situation, to which point should the kicker kick the ball? We propose a function that expresses heuristics to evaluate an advantageous target point for safely sending/receiving a pass and scoring. The heuristics includes an interaction term between a kicker and a receiver to intensify their coordination. To calculate the interaction term, we let kicker/receiver agents have a receiver/kicker action decision model to predict his teammatepsilas action. The evaluation function makes it possible to handle a large space of states consisting of the positions of a kicker, a receiver, and their opponents. The target point of the free kick is selected by the kicker using Boltzmann selection with an evaluation function. Parameters in the function can be learned by a kind of reinforcement learning called the policy gradient method. The point to which a receiver should run to receive the ball is simultaneously learned in the same manner. The effectiveness of our solution was shown by experiments.

[1] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[2] Sandip Sen,et al. Evolution and learning in multiagent systems , 1998, Int. J. Hum. Comput. Stud..

[3] Tomohito Andou,et al. Refinement of Soccer Agents' Positions Using Reinforcement Learning , 1997, RoboCup.

[4] Peter Stone,et al. Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[5] Sandip Sen,et al. Learning in multiagent systems , 1999 .

[6] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[8] Martin A. Riedmiller,et al. On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[9] Harukazu Igarashi,et al. Applying the policy gradient method to behavior learning in multiagent systems: The pursuit problem , 2006, Systems and Computers in Japan.

[10] Sandip Sen,et al. Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.

[11] Masaomi Kimura,et al. Reinforcement Learning in Non-Markov Decision Processes: Statistical Properties of Characteristic Eligibility , 2008 .

[12] Alan F. Murray,et al. International Joint Conference on Neural Networks , 1993 .

[13] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.

[14] Julie A. Adams,et al. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 2001, AI Mag..

[15] Gerhard Weiss,et al. Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .

[16] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[17] Peter Stone,et al. Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.