Using policy gradient reinforcement learning on autonomous robot controllers

Robot programmers can often quickly program a robot to approximately execute a task under specific environment conditions. However, achieving robust performance under more general conditions is significantly more difficult. We propose a framework that starts with an existing control system and uses reinforcement feedback from the environment to autonomously improve the controller's performance. We use the policy gradient reinforcement learning (PGRL) framework, which estimates a gradient (in controller space) of improved reward, allowing the controller parameters to be incrementally updated to autonomously achieve locally optimal performance. Our approach is experimentally verified on a Cye robot executing a room entry and observation task, showing significant reduction in task execution time and robustness with respect to un-modelled changes in the environment.

[1]  Gregory Z. Grudic,et al.  Localizing Search in Reinforcement Learning , 2000, AAAI/IAAI.

[2]  Maja J. Mataric,et al.  Issues and approaches in the design of collective autonomous agents , 1995, Robotics Auton. Syst..

[3]  Michael S. Branicky,et al.  Studies in hybrid systems: modeling, analysis, and control , 1996 .

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[6]  Sridhar Mahadevan,et al.  Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions , 1992, ML.

[7]  Sridhar Mahadevan,et al.  Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.

[8]  François Michaud,et al.  Representation of behavioral history for learning in nonstationary conditions , 1999, Robotics Auton. Syst..

[9]  Ronald C. Arkin,et al.  Robot behavioral selection using q-learning , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[12]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[13]  A. Morse,et al.  Basic problems in stability and design of switched systems , 1999 .

[14]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[15]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[16]  Roderic A. Grupen,et al.  Learning optimal switching policies for path tracking tasks on a mobile robot , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Minoru Asada,et al.  Purposive behavior acquisition for a real robot by vision-based reinforcement learning , 1995, Machine Learning.

[18]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[19]  David Lee The map-building and exploration strategies of a simple sonar-equipped mobile robot , 1996 .