Online Learning of Robot Soccer Free Kick Plans Using a Bandit Approach

This paper presents an online learning approach for teams of autonomous soccer robots to select free kick plans. In robot soccer, free kicks present an opportunity to execute plans with relatively controllable initial conditions. However, the effectiveness of each plan is highly dependent on the adversary, and there are few free kicks during each game, making it necessary to learn online from sparse observations. To achieve learning, we first greatly reduce the planning space by framing the problem as a contextual multi-armed bandit problem, in which the actions are a set of pre-computed plans, and the state is the position of the free kick on the field. During execution, we model the reward function for different free kicks using Gaussian Processes, and perform online learning using the Upper Confidence Bound algorithm. Results from a physics-based simulation reveal that the robots are capable of adapting to various different realistic opponents to maximize their expected reward during free kicks.

[1]  Andreas Krause,et al.  Gaussian Process Bandits without Regret: An Experimental Design Approach , 2009, ArXiv.

[2]  Brett Browning,et al.  STP: Skills, tactics, and plays for multi-robot control in adversarial environments , 2005 .

[3]  John Langford,et al.  Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[4]  Manuela M. Veloso,et al.  CMDragons 2015: Coordinated Offense and Defense of the SSL Champions , 2015, RoboCup.

[5]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Manuela M. Veloso,et al.  Selectively Reactive Coordination for a Team of Robot Soccer Champions , 2016, AAAI.

[8]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[9]  Patrick MacAlpine,et al.  UT Austin Villa 2014: RoboCup 3D Simulation League Champion via Overlapping Layered Learning , 2015, AAAI.

[10]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[11]  Manuela M. Veloso,et al.  Opponent-driven planning and execution for pass, attack, and defense in a multi-robot soccer team , 2014, AAMAS.

[12]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[13]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[14]  Manuela M. Veloso,et al.  Detecting and Correcting Model Anomalies in Subspaces of Robot Planning Domains , 2015, AAMAS.