论文信息 - A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot

A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot

This paper provides a new deterministic Q-learning with a presumed knowledge about the distance from the current state to both the next state and the goal. This knowledge is efficiently used to update the entries in the Q-table once only by utilizing four derived properties of the Q-learning, instead of repeatedly updating them like the classical Q-learning. Naturally, the proposed algorithm has an insignificantly small time complexity in comparison to its classical counterpart. Furthermore, the proposed algorithm stores the Q-value for the best possible action at a state and thus saves significant storage. Experiments undertaken on simulated maze and real platforms confirm that the Q-table obtained by the proposed Q-learning when used for the path-planning application of mobile robots outperforms both the classical and the extended Q-learning with respect to three metrics: traversal time, number of states traversed, and 90° turns required. The reduction in 90° turnings minimizes the energy consumption and thus has importance in the robotics literature.

[1] Cheng Wu,et al. Spectrum management of cognitive radio using multi-agent reinforcement learning , 2010, AAMAS.

[2] Zhe Chen,et al. Towards a Real-Time Cognitive Radio Network Testbed: Architecture, Hardware Platform, and Application to Smart Grid , 2010, 2010 Fifth IEEE Workshop on Networking Technologies for Software Defined Radio Networks (SDR).

[3] Lydia E. Kavraki,et al. Path planning for minimal energy curves of constant length , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[4] H. Hoyer,et al. Planning of optimal paths for autonomous agents moving in inhomogeneous environments , 1997, 1997 8th International Conference on Advanced Robotics. Proceedings. ICAR'97.

[5] Ana Galindo-Serrano,et al. Distributed Q-Learning for Aggregated Interference Control in Cognitive Radio Networks , 2010, IEEE Transactions on Vehicular Technology.

[6] Paul Levi,et al. Cooperative Multi-Robot Path Planning by Heuristic Priority Adjustment , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7] S. S. Masoumzadeh,et al. Deep Blue: A Fuzzy Q-Learning Enhanced Active Queue Management Scheme , 2009, 2009 International Conference on Adaptive and Intelligent Systems.

[8] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[9] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.

[10] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[11] Gerhard Zucker,et al. Integrating internal performance measures into the decision making process of autonomous agents , 2010, 3rd International Conference on Human System Interaction.

[12] Zbigniew Michalewicz,et al. Adaptive evolutionary planner/navigator for mobile robots , 1997, IEEE Trans. Evol. Comput..

[13] Thomas Dean,et al. Reinforcement Learning for Planning and Control , 1993 .

[14] Jie Wang,et al. Hybrid Q-learning algorithm about cooperation in MAS , 2009, 2009 Chinese Control and Decision Conference.

[15] Yutaka Sakaguchi,et al. Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function , 2004, Neural Networks.

[16] H. Hashimoto,et al. Robot navigation framework based on reinforcement learning for intelligent space , 2008, 2008 Conference on Human System Interactions.

[17] Jan Peters,et al. Computational Intelligence: Principles, Techniques and Applications , 2007, Comput. J..

[18] Il Hong Suh,et al. Fast reinforcement learning using stochastic shortest paths for a mobile robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19] Bechir Hamdaoui,et al. Q-learning for opportunistic spectrum access , 2010, IWCMC.

[20] Amit Konar,et al. Extended Q-Learning Algorithm for Path-Planning of a Mobile Robot , 2010, SEAL.

[21] Ryszard Kowalczyk,et al. Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[22] Shoufeng Lu,et al. Incremental multistep Q-learning for adaptive traffic signal control based on delay minimization strategy , 2008, 2008 7th World Congress on Intelligent Control and Automation.

[23] Seong-Won Lee,et al. Quarter-pel Interpolation Architecture in H.264/AVC Decoder , 2007 .

[24] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[25] P. Frazier,et al. The Knowledge Gradient Policy for Offline Learning with Independent Normal Rewards , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[26] Jihong Lee,et al. A minimum-time trajectory planning method for two robots , 1992, IEEE Trans. Robotics Autom..

[27] Punit Pandey,et al. Approximate Q-Learning: An Introduction , 2010, 2010 Second International Conference on Machine Learning and Computing.

[28] P. Korondi,et al. Hierarchical Reinforcement Learning for Robot Navigation using the Intelligent Space Concept , 2007, 2007 11th International Conference on Intelligent Engineering Systems.

[29] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[30] Meng Joo Er,et al. A Novel Q-Learning Approach with Continuous States and Actions , 2007, 2007 IEEE International Conference on Control Applications.

[31] Tadashi Dohi,et al. Application of Reinforcement Learning to Software Rejuvenation , 2011, 2011 Tenth International Symposium on Autonomous Decentralized Systems.

[32] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[33] Zhang Zheng,et al. A Method of Reinforcement Learning Based Automatic Traffic Signal Control , 2011, 2011 Third International Conference on Measuring Technology and Mechatronics Automation.

[34] Hsin-Yi Lin,et al. A CMAC-Q-Learning based Dyna agent , 2008, 2008 SICE Annual Conference.

[35] Amit Konar,et al. Cooperative multi-robot path planning using differential evolution , 2009, J. Intell. Fuzzy Syst..

[36] R. Rodriguez,et al. Navigation of Autonomous Vehicles in Unknown Environments using Reinforcement Learning , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[37] TomMart,et al. Navigation of Autonomous Vehicles in Unknown Environments using Reinforcement Learning , 2007 .

[38] Jae-Bok Song,et al. Path Planning for a Robot Manipulator based on Probabilistic Roadmap and Reinforcement Learning , 2007 .

[39] R. Bellman. Dynamic programming. , 1957, Science.

[40] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[41] Robert C. Qiu,et al. Q-learning based bidding algorithm for spectrum auction in cognitive radio , 2011, 2011 Proceedings of IEEE Southeastcon.