Neural Reinforcement Learning Controllers for a Real Robot Application

Accurate and fast control of wheel speeds in the presence of noise and nonlinearities is one of the crucial requirements for building fast mobile robots, as they are required in the MiddleSize League of RoboCup. We will describe, how highly effective speed controllers can be learned from scratch on the real robot directly. The use of our recently developed neural fitted Q iteration scheme allows reinforcement learning of neural controllers with only a limited amount of training data seen. In the described application, less than 5 minutes of interaction with the real robot were sufficient, to learn fast and accurate control to arbitrary target speeds.

[1]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[3]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[5]  Martin A. Riedmiller Concepts and Facilities of a Neural Reinforcement Learning Control Architecture for Technical Process Control , 1999, Neural Computing & Applications.

[6]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[7]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[8]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[9]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[10]  Martin A. Riedmiller Neural reinforcement learning to swing-up and balance a real pole , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[11]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[12]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.