论文信息 - Neural reinforcement learning to swing-up and balance a real pole

Neural reinforcement learning to swing-up and balance a real pole

This paper proposes a neural network based reinforcement learning controller that is able to learn control policies in a highly data efficient manner. This allows to apply reinforcement learning directly to real plants -neither a transition model nor a simulation model of the plant is needed for training. The only training information provided to the controller are transition experiences collected from interactions with the real plant. By storing these transition experiences explicitly, they can be reconsidered for updating the neural Q-function in every training step. This results in a stable learning process of a neural Q-value function. The algorithm is applied to learn the highly nonlinear and noisy task of swinging-up and balancing a real inverted pendulum. The amount of real time interaction needed to learn a highly effective policy from scratch was less than 14 minutes.

Martin A. Riedmiller

[1] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[3] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[4] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[5] Martin A. Riedmiller. Concepts and Facilities of a Neural Reinforcement Learning Control Architecture for Technical Process Control , 1999, Neural Computing & Applications.

[6] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[7] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.

[8] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[9] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..