Semi-online neural-Q/spl I.bar/leaming for real-time robot learning

Reinforcement learning (RL) is a very suitable technique for robot learning, as it can learn in unknown environments and in real-time computation. The main difficulties in adapting classic RL algorithms to robotic systems are the generalization problem and the correct observation of the Markovian state. This paper attempts to solve the generalization problem by proposing the semi-online neural-Q/spl I.bar/learning algorithm (SONQL). The algorithm uses the classic Q/spl I.bar/learning technique with two modifications. First, a neural network (NN) approximates the Q/spl I.bar/function allowing the use of continuous states and actions. Second, a database of the most representative learning samples accelerates and stabilizes the convergence. The term semi-online is referred to the fact that the algorithm uses the current but also past learning samples. However, the algorithm is able to learn in real-time while the robot is interacting with the environment. The paper shows simulated results with the "mountain-car" benchmark and, also, real results with an underwater robot in a target following behavior.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[3]  RamAshwin,et al.  Experiments with reinforcement learning in problems with continuous state and action spaces , 1998 .

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Andrew W. Moore,et al.  Variable Resolution Dynamic Programming , 1991, ML Workshop.

[6]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[9]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[10]  Chris Gaskett,et al.  Q-Learning for Robot Control , 2002 .

[11]  Pere Ridao,et al.  Vision-based localization of an underwater robot in a structured environment , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[12]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  Marios M. Polycarpou,et al.  An analytical framework for local feedforward networks , 1998, IEEE Trans. Neural Networks.

[15]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.