Policy Gradient Approach for Learning of Soccer Player Agents: Pass Selection of Midfielders

This research develops a learning method for the pass selection problem of midfielders in RoboCup Soccer Simulation games. A policy gradient method is applied as a learning method to solve this problem because it can easily represent the various heuristics of pass selection in a policy function. We implement the learning function in the midfielders’ programs of a well-known team, UvA Trilearn Base 2003. Experimental results show that our method effectively achieves clever pass selection by midfielders in full games. Moreover, in this method’s framework, dribbling is learned as a pass technique, in essence to and from the passer itself. It is also shown that the improvement in pass selection by our learning helps to make a team much stronger.

[1]  Harukazu Igarashi,et al.  Learning of Soccer player agents using a policy gradient method: Pass selection , 2010 .

[2]  Tomohito Andou,et al.  Refinement of Soccer Agents' Positions Using Reinforcement Learning , 1997, RoboCup.

[3]  Sandip Sen,et al.  Adaption and Learning in Multi-Agent Systems: Ijcai'95 Workshop, Montreal, Canada, August 21, 1995, Proceedings , 1996 .

[4]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[5]  Martin A. Riedmiller,et al.  On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[6]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[7]  Sandip Sen,et al.  Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.

[8]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9]  Harukazu Igarashi,et al.  Applying the policy gradient method to behavior learning in multiagent systems: The pursuit problem , 2006 .

[10]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[11]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[12]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[13]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[14]  Masaomi Kimura,et al.  Reinforcement Learning in Non-Markov Decision Processes: Statistical Properties of Characteristic Eligibility , 2008 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[16]  Harukazu Igarashi,et al.  Learning of soccer player agents using a policy gradient method: Coordination between kicker and receiver during free kicks , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[17]  Gerhard Weiss,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .

[18]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[19]  Peter Stone,et al.  Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.