A Cat-Like Robot Real-Time Learning to Run

Actor-Critics constitute an important class of reinforcement learning algorithms that can deal with continuous actions and states in an easy and natural way. In their original, sequential form, these algorithms are usually to slow to be applicable to real-life problems. However, they can be augmented by the technique of experience replay to obtain a satisfactory of learning without degrading their convergence properties. In this paper experimental results are presented that show that the combination of experience replay and Actor-Critics yields very fast learning algorithms that achieve successful policies for nontrivial control tasks in considerably short time. Namely, a policy for a model of 6-degree-of-freedom walking robot is obtained after 4 hours of the robot's time.

[1]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[3]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[4]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[7]  Pawel Cichosz,et al.  An Analysis of Experience Replay in Temporal Difference Learning , 1999, Cybern. Syst..

[8]  P. Bartlett,et al.  Stochastic optimization of controlled partially observable Markov decision processes , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[9]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Shalabh Bhatnagar,et al.  Incremental Natural Actor-Critic Algorithms , 2007, NIPS.

[14]  P. Wawrzynski,et al.  Learning to Control a 6-Degree-of-Freedom Walking Robot , 2007, EUROCON 2007 - The International Conference on "Computer as a Tool".

[15]  P. Wawrzynski,et al.  Truncated Importance Sampling for Reinforcement Learning with Experience Replay , 2007 .

[16]  Shalabh Bhatnagar,et al.  Natural actorcritic algorithms. , 2009 .