Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks

The RoboCup Simulation League is recognized as a test bed for research on multi-agent learning. As an example of multi-agent learning in a soccer game, we dealt with a learning problem between a kicker and a receiver when a direct free kick is awarded just outside the opponentpsilas penalty area. In such a situation, to which point should the kicker kick the ball? We propose a function that expresses heuristics to evaluate an advantageous target point for safely sending/receiving a pass and scoring. The heuristics includes an interaction term between a kicker and a receiver to intensify their coordination. To calculate the interaction term, we let kicker/receiver agents have a receiver/kicker action decision model to predict his teammatepsilas action. The evaluation function makes it possible to handle a large space of states consisting of the positions of a kicker, a receiver, and their opponents. The target point of the free kick is selected by the kicker using Boltzmann selection with an evaluation function. Parameters in the function can be learned by a kind of reinforcement learning called the policy gradient method. The point to which a receiver should run to receive the ball is simultaneously learned in the same manner. The effectiveness of our solution was shown by experiments.

[1]  Shigenobu Kobayashi,et al.  Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[2]  Sandip Sen,et al.  Evolution and learning in multiagent systems , 1998, Int. J. Hum. Comput. Stud..

[3]  Tomohito Andou,et al.  Refinement of Soccer Agents' Positions Using Reinforcement Learning , 1997, RoboCup.

[4]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[5]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[8]  Martin A. Riedmiller,et al.  On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[9]  Harukazu Igarashi,et al.  Applying the policy gradient method to behavior learning in multiagent systems: The pursuit problem , 2006, Systems and Computers in Japan.

[10]  Sandip Sen,et al.  Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.

[11]  Masaomi Kimura,et al.  Reinforcement Learning in Non-Markov Decision Processes: Statistical Properties of Characteristic Eligibility , 2008 .

[12]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[13]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[14]  Julie A. Adams,et al.  Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 2001, AI Mag..

[15]  Gerhard Weiss,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .

[16]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[17]  Peter Stone,et al.  Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.