An efficient initialization approach of Q-learning for mobile robots

This article demonstrates that Q-learning can be accelerated by appropriately specifying initial Q-values using dynamic wave expansion neural network. In our method, the neural network has the same topography as robot work space. Each neuron corresponds to a certain discrete state. Every neuron of the network will reach an equilibrium state according to the initial environment information. The activity of the special neuron denotes the maximum cumulative reward by following the optimal policy from the corresponding state when the network is stable. Then the initial Q-values are defined as the immediate reward plus the maximum cumulative reward by following the optimal policy beginning at the succeeding state. In this way, we create a mapping between the known environment information and the initial values of Q-table based on neural network. The prior knowledge can be incorporated into the learning system, and give robots a better learning foundation. Results of experiments in a grid world problem show that neural network-based Q-learning enables a robot to acquire an optimal policy with better learning performance compared to conventional Q-learning and potential field-based Qlearning.

[1]  D.V. Lebedev,et al.  Real-time path planning in dynamic environments: a comparison of three neural network models , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[2]  Daoyi Dong,et al.  Hybrid Control for Robot Navigation - A Hierarchical Q-Learning Algorithm , 2008, IEEE Robotics & Automation Magazine.

[3]  Chi-Hyon Oh,et al.  Initialization of Q-values by fuzzy rules for accelerating Q-learning , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[4]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[6]  Brahim Chaib-draa,et al.  Reducing the complexity of multiagent reinforcement learning , 2007, AAMAS '07.

[7]  Il Hong Suh,et al.  SSPQL: Stochastic shortest path-based Q-learning , 2011 .

[8]  Stefan Schaal,et al.  Learning Control in Robotics , 2010, IEEE Robotics & Automation Magazine.

[9]  Nadine Le Fort-Piat,et al.  Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning , 2006, ICANN.

[10]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[11]  Shishir Kumar,et al.  Reinforcement Learning by Comparing Immediate Reward , 2010, ArXiv.

[12]  Se-Young Oh,et al.  Recognition and path planning strategy for autonomous navigation in the elevator environment , 2010 .

[13]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[14]  Chen Xiong,et al.  New approach of neural network for robot path planning , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[15]  Yan Meng,et al.  Distributed Reinforcement Learning for Coordinate Multi-Robot Foraging , 2010, J. Intell. Robotic Syst..

[16]  Michael L. Littman,et al.  Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[17]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .