Reinforcement learning algorithms for robotic navigation in dynamic environments

An issue of primary importance in the field of reinforcement learning (RL) is the tradeoff between exploration and exploitation. Balancing exploration and exploitation becomes even more crucial in an intelligent agent designed to operate in dynamic environments. Three methods are proposed to improve the performance of traditional RL methods such as Q-learning. The proposed improvements are: the addition of a forgetting mechanism, the use of feature-based state inputs, and the addition of a hierarchical structure to an RL agent. Experimental results are presented and utilized in evaluation of the proposed methods. The proposed algorithms are compared to established methods and to theoretically optimal solutions.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  I. Witten The apparent conflict between estimation and control—a survey of the two-armed bandit problem , 1976 .

[3]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Roderic A. Grupen,et al.  Dynamical categories and control policy selection , 1998, Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell.

[6]  Mohan M. Trivedi,et al.  A neuro-fuzzy controller for mobile robot navigation and multirobot convoying , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[7]  Kai-Tai Song,et al.  Reactive navigation in dynamic environment using a multisensor predictor , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[8]  Frédéric Davesne,et al.  Reactive navigation of a mobile robot using a hierarchical set of learning agents , 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289).

[9]  Alessandro Saffiotti,et al.  Fuzzy landmark-based localization for a legged robot , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[10]  Gary G. Yen,et al.  Coordination of exploration and exploitation in a dynamic environment , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).