Swarmand pheromone based reinforcement learning methods for the robot(s) path search problem

With the world moving to an automated platform, robots are finding application in almost all domains to reduce the human effort. One such domain is to find a path in an unknown and hostile environment to reach the goal. Due to the complexity of many tasks in this domain itis difficult for the robots (agents) to solve this with pre-programmed agent behaviors. Instead of using pre-programmed agent behaviors agents must discover a solution on their own by using learning. In simple reinforcement learning algorithms, a single agent learns how to achieve a goal through many episodes. But if complexity of learning problem is increased or the number of agents is more, it may lead to take more computation time in order to obtain the optimal policy and sometimes it may be the situation that it may not reach to the goal. For optimization problems, bio inspired multi-agent search methods such as particle swarm optimization, ant colony optimization have been recognized to obtain a global optimal solution for multi-modal functions with wide solution space rapidly. This paper proposes a SARSA based reinforcement learning algorithm using one agent and two agents where the agents are guided by the pheromone levels also called the Phe-SARSA. In this algorithm agents learn through not only their respective experiences but also with the help of pheromone trail left by other agents to search for the shortest path. The algorithms have been simulated in the MATLAB 2013a and the results have been compared with the Q-learning, SARSA, Q-Swarm, SARSA-Swarm and Phe-Q algorithms.

[1]  E. Wilson,et al.  Journey to the ants: a story of scientific exploration , 1994 .

[2]  Wei Wu,et al.  A Multi-agent Traffic Signal Control System Using Reinforcement Learning , 2009, 2009 Fifth International Conference on Natural Computation.

[3]  P. Nahodil,et al.  Adopting animal concepts in hierarchical reinforcement learning and control of intelligent agents , 2008, 2008 2nd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics.

[4]  Y. Kuroe,et al.  Reinforcement Learning through Interaction among Multiple Agents , 2006, 2006 SICE-ICASE International Joint Conference.

[5]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[6]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[7]  Sidney Nascimento Givigi,et al.  Multiple-model Q-learning for stochastic reinforcement delays , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[8]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Chia-Feng Juang,et al.  Reinforcement fuzzy control using Ant Colony Optimization , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[10]  Ying Wang,et al.  A reinforcement learning based robotic navigation system , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Huan Tan,et al.  Integration of evolutionary computing and reinforcement learning for robotic imitation learning , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[13]  Gaurav S. Sukhatme,et al.  Whistling in the dark: cooperative trail following in uncertain localization space , 2000, AGENTS '00.

[14]  Kai Oliver Arras,et al.  Inverse Reinforcement Learning algorithms and features for robot navigation in crowds: An experimental comparison , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[16]  H. Van Dyke Parunak,et al.  Mechanisms and Military Applications for Synthetic Pheromones , 2001 .

[17]  Yunfei Zhang,et al.  Autonomous robot navigation with self-learning for collision avoidance with randomly moving obstacles , 2014, 2014 9th International Conference on Computer Science & Education.

[18]  J. Verhaeghe,et al.  Ontogenesis of trail pheromone production and trail following behaviour in the workers ofMyrmica rubra L. (Formicidae) , 1974, Insectes Sociaux.

[19]  J. Deneubourg,et al.  Self-organized shortcuts in the Argentine ant , 1989, Naturwissenschaften.

[20]  Olav E. Krigolson,et al.  How We Learn to Make Decisions: Rapid Propagation of Reinforcement Learning Prediction Errors in Humans , 2014, Journal of Cognitive Neuroscience.

[21]  Qiao Zhang,et al.  The improved Q-Learning algorithm based on pheromone mechanism for swarm robot system , 2013, Proceedings of the 32nd Chinese Control Conference.

[22]  Hyo-Sung Ahn,et al.  A Robot Learns How to Entice an Insect , 2015, IEEE Intelligent Systems.

[23]  Hyo-Sung Ahn,et al.  Bio-insect and artificial robot interaction using cooperative reinforcement learning , 2012, 2012 IEEE International Symposium on Intelligent Control.

[24]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[25]  Y. Kuroe,et al.  Swarm reinforcement learning algorithms based on Sarsa method , 2008, 2008 SICE Annual Conference.

[26]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.