Light-weight Reinforcement Learning with Function Approximation for Real-life Control Tasks

Despite the impressive achievements of reinforcement learning (RL) in playing Backgammon already in the beginning of the 90’s, relatively few successful real-world applications of RL have been reported since then. This could be due to the tendency of RL research to focus on discrete Markov Decision Processes that make it difficult to handle tasks with continuous-valued features. Another reason could be a tendency to develop continuously more complex mathematical RL models that are difficult to implement and operate. Both of these issues are addressed in this paper by using the gradient-descent Sarsa(λ) method together with a Normalised Radial Basis Function neural net. The experimental results on three typical benchmark control tasks show that these methods outperform most previously reported results on these tasks, while remaining computationally feasible to implement even as embedded software. Therefore the presented results can serve as a reference both regarding learning performance and computational applicability of RL for real-life applications.

[1]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[2]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[3]  Kary Främling,et al.  Guiding exploration by pre-existing knowledge without modifying reward , 2007, Neural Networks.

[4]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[5]  Kary Främling Replacing eligibility trace for action-value learning with function approximation , 2007, ESANN.

[6]  Kary Främling Scaled Gradient Descent Learning Rate - Reinforcement Learning with Light-Seeking Robot , 2004, ICINCO.

[7]  Andrew W. Moore,et al.  Policy Search using Paired Comparisons , 2003, J. Mach. Learn. Res..

[8]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[9]  Steffen Udluft,et al.  The Recurrent Control Neural Network , 2007, ESANN.

[10]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[11]  H. Wechsler,et al.  Competitive reinforcement learning in continuous control tasks , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[12]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[14]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[15]  Kary Främling Adaptive robot learning in a non-stationary environment , 2005, ESANN.

[16]  Thomas Martinetz,et al.  Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environments , 2007, ESANN.

[17]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[18]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[19]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[20]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[21]  Long Ji Lin,et al.  Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..

[22]  James S. Albus,et al.  Data Storage in the Cerebellar Model Articulation Controller (CMAC) , 1975 .