Scaling Reinforcement Learning toward RoboCup Soccer

RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the effects of actions. We describe our application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, “the keepers,” tries to keep control of the ball for as long as possible despite the efforts of “the takers.” The keepers learn individually when to hold the ball and when to pass to a teammate, while the takers learn when to charge the ball-holder and when to cover possible passing lanes. Our agents learned policies that significantly out-performed a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.

[1]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[2]  C. Watkins Learning from delayed rewards , 1989 .

[3]  Hyongsuk Kim,et al.  CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Thomas Dean,et al.  Reinforcement Learning for Planning and Control , 1993 .

[6]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[9]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[10]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[11]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[12]  Hiroaki Kitano,et al.  The RoboCup Synthetic Agent Challenge 97 , 1997, IJCAI.

[13]  Tomohito Andou,et al.  Refinement of Soccer Agents' Positions Using Reinforcement Learning , 1997, RoboCup.

[14]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[15]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[16]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17]  Peter Stone,et al.  Anticipation as a key for collaboration in a team of agents: a case study in robotic soccer , 1999, Optics East.

[18]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[19]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[20]  Eiji Uchibe,et al.  Cooperative Behavior Acquisition by Learning and Evolution in a Multi-Agent Environment for Mobile Robots , 1999 .

[21]  Peter Stone,et al.  Keeping the Ball from CMUnited-99 , 2000, RoboCup.

[22]  Reinforcement Learning for 3 vs. 2 Keepaway , 2000, RoboCup.

[23]  Geoffrey J. Gordon Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[24]  Peter Stone,et al.  Layered learning in multiagent systems - a winning approach to robotic soccer , 2000, Intelligent robotics and autonomous agents.

[25]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[26]  Hiroaki Kitano,et al.  RoboCup-98: Robot Soccer World Cup II , 2001, Lecture Notes in Computer Science.

[27]  Peter Stone,et al.  RoboCup 2000: Robot Soccer World Cup IV , 2001, RoboCup.

[28]  Peter Stone,et al.  An architecture for action selection in robotic soccer , 2001, AGENTS '01.