Reinforcement learning of competitive and cooperative skills in soccer agents

The main aim of this paper is to provide a comprehensive numerical analysis on the efficiency of various reinforcement learning (RL) techniques in an agent-based soccer game. The SoccerBots is employed as a simulation testbed to analyze the effectiveness of RL techniques under various scenarios. A hybrid agent teaming framework for investigating agent team architecture, learning abilities, and other specific behaviours is presented. Novel RL algorithms to verify the competitive and cooperative learning abilities of goal-oriented agents for decision-making are developed. In particular, the tile coding (TC) technique, a function approximation approach, is used to prevent the state space from growing exponentially, hence avoiding the curse of dimensionality. The underlying mechanism of eligibility traces is evaluated in terms of on-policy and off-policy procedures, as well as accumulating traces and replacing traces. The results obtained are analyzed, and implications of the results towards agent teaming and learning are discussed.

[1]  Manuela M. Veloso,et al.  Team-Partitioned, Opaque-Transition Reinforced Learning , 1998, RoboCup.

[2]  Shimon Whiteson,et al.  Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[3]  Bernhard Nebel,et al.  Towards a Life-Long Learning Soccer Agent , 2002, RoboCup.

[4]  Marco Wiering,et al.  Convergence and Divergence in Standard and Averaging Reinforcement Learning , 2004, ECML.

[5]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[6]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[7]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[8]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[9]  R.J. Williams,et al.  Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[10]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[11]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[12]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[13]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[14]  R. Bellman A Markovian Decision Process , 1957 .

[15]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[16]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[17]  Martin A. Riedmiller,et al.  Learning a Partial Behavior for a Competitive Robotic Soccer Agent , 2006, Künstliche Intell..

[18]  M. K. Ali,et al.  Convergence of reinforcement learning algorithms and acceleration of learning. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  P. Dayan The Convergence of TD(λ) for General λ , 2004, Machine Learning.

[20]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Lakhmi C. Jain,et al.  Teamwork and Simulation in Hybrid Cognitive Architecture , 2006, KES.

[23]  Michael Wooldridge,et al.  Intelligent agents: theory and practice The Knowledge Engineering Review , 1995 .

[24]  Lakhmi C. Jain,et al.  Reinforcement Learning of Competitive Skills with Soccer Agents , 2007, KES.

[25]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[26]  Nicholas R. Jennings,et al.  Intelligent agents: theory and practice , 1995, The Knowledge Engineering Review.

[27]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[28]  Terrence J. Sejnowski,et al.  TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[29]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[30]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[31]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .