论文信息 - Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark

Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark

Neural batch reinforcement learning (RL) algorithms have recently shown to be a powerful tool for model-free reinforcement learning problems. In this paper, we present a novel learning benchmark from the realm of computer games and apply a variant of a neural batch RL algorithm in the scope of this benchmark. Defining the learning problem and appropriately adjusting all relevant parameters is often a tedious task for the researcher who implements and investigates some learning approach. In RL, the suitable choice of the function c of immediate costs is crucial, and, when utilizing multi-layer perceptron neural networks for the purpose of value function approximation, the definition of c must be well aligned with the specific characteristics of this type of function approximator. Determining this alignment is especially tricky, when no a priori knowledge about the task and, hence, about optimal policies is available. To this end, we propose a simple, but effective dynamic scaling heuristic that can be seamlessly integrated into contemporary neural batch RL algorithms. We evaluate the effectiveness of this heuristic in the context of the well-known pole swing-up benchmark as well as in the context of the novel gaming benchmark we are suggesting.

Martin A. Riedmiller | Thomas Gabel | Christian Lutz

[1] J. Peters,et al. Approximate dynamic programming with Gaussian processes , 2008, 2008 American Control Conference.

[2] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[3] Bruno Bouzy,et al. Computer Go: An AI oriented survey , 2001, Artif. Intell..

[4] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[5] Martin A. Riedmiller,et al. On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[6] Sylvain Gelly,et al. Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[7] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[8] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[9] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..

[10] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[11] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.

[12] Martin A. Riedmiller,et al. Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[13] Peter Stone,et al. Progress in Learning 3 vs. 2 Keepaway , 2003, RoboCup.

[14] Martin A. Riedmiller. Concepts and Facilities of a Neural Reinforcement Learning Control Architecture for Technical Process Control , 1999, Neural Computing & Applications.

[15] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[16] Hamidreza Chitsaz,et al. The Fifth Robotic Soccer World Championships , 2002 .

[17] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[18] Hiroaki Kitano,et al. RoboCup-2001: The Fifth Robotic Soccer World Championships , 2002, AI Mag..

[19] Gerald Tesauro,et al. TD-Gammon: A Self-Teaching Backgammon Program , 1995 .