Learning of Soccer player agents using a policy gradient method: Pass selection

Learning of Soccer Player Agents Using a Policy Gradient Method: Pass Selection Harukazu IGARASHI, Hitoshi FUKUOKA, Naoto SANO, Seji ISHIHARA Shibaura Institute of Technology, Kinki University This research develops a learning method for the pass selection problem of midfielders in RoboCup Soccer Simulation games. A policy gradient method is applied as a learning method to solve this problem because it can easily represent the various heuristics of pass selection in a policy function. We implement the learning function in the midfielders’ programs of two well-known teams, UvA Trilearn 2003 and HELIOS. Experimental results show that our method effectively achieves clever pass selection by midfielders in full games. Moreover, in this method’s framework, dribbling is learned as a pass technique, in essence to and from the passer itself. It is also shown that the improvement in pass selection by our learning helps to make a team much stronger.

[1]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[2]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[4]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[5]  Masaomi Kimura,et al.  Reinforcement Learning in Non-Markov Decision Processes: Statistical Properties of Characteristic Eligibility , 2008 .

[6]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  Peter Stone,et al.  Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.

[9]  Tomohito Andou,et al.  Refinement of Soccer Agents' Positions Using Reinforcement Learning , 1997, RoboCup.

[10]  Martin A. Riedmiller,et al.  On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[11]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[12]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[14]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[15]  Sandip Sen,et al.  Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.

[16]  Harukazu Igarashi,et al.  Applying the policy gradient method to behavior learning in multiagent systems: The pursuit problem , 2006, Systems and Computers in Japan.

[17]  Harukazu Igarashi,et al.  Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).