A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot

This paper provides a new deterministic Q-learning with a presumed knowledge about the distance from the current state to both the next state and the goal. This knowledge is efficiently used to update the entries in the Q-table once only by utilizing four derived properties of the Q-learning, instead of repeatedly updating them like the classical Q-learning. Naturally, the proposed algorithm has an insignificantly small time complexity in comparison to its classical counterpart. Furthermore, the proposed algorithm stores the Q-value for the best possible action at a state and thus saves significant storage. Experiments undertaken on simulated maze and real platforms confirm that the Q-table obtained by the proposed Q-learning when used for the path-planning application of mobile robots outperforms both the classical and the extended Q-learning with respect to three metrics: traversal time, number of states traversed, and 90° turns required. The reduction in 90° turnings minimizes the energy consumption and thus has importance in the robotics literature.

[1]  Cheng Wu,et al.  Spectrum management of cognitive radio using multi-agent reinforcement learning , 2010, AAMAS.

[2]  Zhe Chen,et al.  Towards a Real-Time Cognitive Radio Network Testbed: Architecture, Hardware Platform, and Application to Smart Grid , 2010, 2010 Fifth IEEE Workshop on Networking Technologies for Software Defined Radio Networks (SDR).

[3]  Lydia E. Kavraki,et al.  Path planning for minimal energy curves of constant length , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[4]  H. Hoyer,et al.  Planning of optimal paths for autonomous agents moving in inhomogeneous environments , 1997, 1997 8th International Conference on Advanced Robotics. Proceedings. ICAR'97.

[5]  Ana Galindo-Serrano,et al.  Distributed Q-Learning for Aggregated Interference Control in Cognitive Radio Networks , 2010, IEEE Transactions on Vehicular Technology.

[6]  Paul Levi,et al.  Cooperative Multi-Robot Path Planning by Heuristic Priority Adjustment , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  S. S. Masoumzadeh,et al.  Deep Blue: A Fuzzy Q-Learning Enhanced Active Queue Management Scheme , 2009, 2009 International Conference on Adaptive and Intelligent Systems.

[8]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[9]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[10]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[11]  Gerhard Zucker,et al.  Integrating internal performance measures into the decision making process of autonomous agents , 2010, 3rd International Conference on Human System Interaction.

[12]  Zbigniew Michalewicz,et al.  Adaptive evolutionary planner/navigator for mobile robots , 1997, IEEE Trans. Evol. Comput..

[13]  Thomas Dean,et al.  Reinforcement Learning for Planning and Control , 1993 .

[14]  Jie Wang,et al.  Hybrid Q-learning algorithm about cooperation in MAS , 2009, 2009 Chinese Control and Decision Conference.

[15]  Yutaka Sakaguchi,et al.  Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function , 2004, Neural Networks.

[16]  H. Hashimoto,et al.  Robot navigation framework based on reinforcement learning for intelligent space , 2008, 2008 Conference on Human System Interactions.

[17]  Jan Peters,et al.  Computational Intelligence: Principles, Techniques and Applications , 2007, Comput. J..

[18]  Il Hong Suh,et al.  Fast reinforcement learning using stochastic shortest paths for a mobile robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Bechir Hamdaoui,et al.  Q-learning for opportunistic spectrum access , 2010, IWCMC.

[20]  Amit Konar,et al.  Extended Q-Learning Algorithm for Path-Planning of a Mobile Robot , 2010, SEAL.

[21]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[22]  Shoufeng Lu,et al.  Incremental multistep Q-learning for adaptive traffic signal control based on delay minimization strategy , 2008, 2008 7th World Congress on Intelligent Control and Automation.

[23]  Seong-Won Lee,et al.  Quarter-pel Interpolation Architecture in H.264/AVC Decoder , 2007 .

[24]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[25]  P. Frazier,et al.  The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[26]  Jihong Lee,et al.  A minimum-time trajectory planning method for two robots , 1992, IEEE Trans. Robotics Autom..

[27]  Punit Pandey,et al.  Approximate Q-Learning: An Introduction , 2010, 2010 Second International Conference on Machine Learning and Computing.

[28]  P. Korondi,et al.  Hierarchical Reinforcement Learning for Robot Navigation using the Intelligent Space Concept , 2007, 2007 11th International Conference on Intelligent Engineering Systems.

[29]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[30]  Meng Joo Er,et al.  A Novel Q-Learning Approach with Continuous States and Actions , 2007, 2007 IEEE International Conference on Control Applications.

[31]  Tadashi Dohi,et al.  Application of Reinforcement Learning to Software Rejuvenation , 2011, 2011 Tenth International Symposium on Autonomous Decentralized Systems.

[32]  SRIDHAR MAHADEVAN,et al.  Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[33]  Zhang Zheng,et al.  A Method of Reinforcement Learning Based Automatic Traffic Signal Control , 2011, 2011 Third International Conference on Measuring Technology and Mechatronics Automation.

[34]  Hsin-Yi Lin,et al.  A CMAC-Q-Learning based Dyna agent , 2008, 2008 SICE Annual Conference.

[35]  Amit Konar,et al.  Cooperative multi-robot path planning using differential evolution , 2009, J. Intell. Fuzzy Syst..

[36]  R. Rodriguez,et al.  Navigation of Autonomous Vehicles in Unknown Environments using Reinforcement Learning , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[37]  TomMart,et al.  Navigation of Autonomous Vehicles in Unknown Environments using Reinforcement Learning , 2007 .

[38]  Jae-Bok Song,et al.  Path Planning for a Robot Manipulator based on Probabilistic Roadmap and Reinforcement Learning , 2007 .

[39]  R. Bellman Dynamic programming. , 1957, Science.

[40]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[41]  Robert C. Qiu,et al.  Q-learning based bidding algorithm for spectrum auction in cognitive radio , 2011, 2011 Proceedings of IEEE Southeastcon.