Efficient, swarm-based path finding in unknown graphs using reinforcement learning

This paper addresses the problem of steering a swarm of autonomous agents out of an unknown maze to some goal located at an unknown location. This is particularly the case in situations where no direct communication between the agents is possible and all information exchange between agents has to occur indirectly through information “deposited” in the environment. To address this task, an ε-greedy, collaborative reinforcement learning method using only local information exchanges is introduced in this paper to balance exploitation and exploration in the unknown maze and to optimize the ability of the swarm to exit from the maze. The learning and routing algorithm given here provides a mechanism for storing data needed to represent the collaborative utility function based on the experiences of previous agents visiting a node that results in routing decisions that improve with time. Two theorems show the theoretical soundness of the proposed learning method and illustrate the importance of the stored information in improving decision-making for routing. Simulation examples show that the introduced simple rules of learning from past experience significantly improve performance over random search and search based on Ant Colony Optimization, a metaheuristic algorithm.

[1]  Dervis Karaboga,et al.  AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION , 2005 .

[2]  D. Jones,et al.  Experience Outweighs Intelligence: An Investigation into the Use of Ant Colony System for Maza Solving , 2003, SNPD.

[3]  P. D. BEATTIE,et al.  Self-Localisation in the ‘Senario’ Autonomous Wheelchair , 1998, J. Intell. Robotic Syst..

[4]  A. Kaveh,et al.  A novel heuristic optimization method: charged system search , 2010 .

[5]  Ismael Rodríguez,et al.  Using River Formation Dynamics to Design Heuristic Algorithms , 2007, UC.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  J. Deneubourg,et al.  Self-organized shortcuts in the Argentine ant , 1989, Naturwissenschaften.

[8]  Uwe Aickelin,et al.  Idiotypic Immune Networks in Mobile-Robot Control , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Gregory Dudek,et al.  Collaborative Robot Exploration and Rendezvous: Algorithms, Performance Bounds and Observations , 2001, Auton. Robots.

[10]  Paulo Martins Engel,et al.  Autonomous Learning Architecture for Environmental Mapping , 2004, J. Intell. Robotic Syst..

[11]  Mohammad Reza Meybodi,et al.  Some Hybrid models to Improve Firefly Algorithm Performance , 2012 .

[12]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[13]  Pierre Degond,et al.  Continuum limit of self-driven particles with orientation interaction , 2007, 0710.0293.

[14]  Xin-She Yang,et al.  Engineering optimisation by cuckoo search , 2010 .

[15]  Hamed Shah-Hosseini,et al.  The intelligent water drops algorithm: a nature-inspired swarm-based optimization algorithm , 2009, Int. J. Bio Inspired Comput..

[16]  Hossein Nezamabadi-pour,et al.  GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[17]  J. Deneubourg,et al.  The self-organizing exploratory pattern of the argentine ant , 1990, Journal of Insect Behavior.

[18]  Brian F. Goldiez,et al.  Effects of Augmented Reality Display Settings on Human Wayfinding Performance , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Zixing Cai,et al.  Cooperative Coevolutionary Adaptive Genetic Algorithm in Path Planning of Cooperative Multi-Mobile Robot Systems , 2002, J. Intell. Robotic Syst..

[20]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[21]  Changhe Li,et al.  Fast Multi-Swarm Optimization for Dynamic Optimization Problems , 2008, 2008 Fourth International Conference on Natural Computation.

[22]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[23]  S. Sitharama Iyengar,et al.  Robot navigation in unknown terrains: Introductory survey of non-heuristic algorithms , 1993 .

[24]  Max Q.-H. Meng,et al.  Neural network approaches to dynamic collision-free trajectory generation , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[25]  J. Suzuki Building an Artificial Immune Network for Decentralized Policy Negotiation in a Communication Endsystem: OpenWebServer/iNexus Study , 2000 .

[26]  Dervis Karaboga,et al.  A survey: algorithms simulating bee swarm intelligence , 2009, Artificial Intelligence Review.

[27]  Heinz Ulrich Hoppe,et al.  An interactive maze scenario with physical robots and other smart devices , 2004, The 2nd IEEE International Workshop on Wireless and Mobile Technologies in Education, 2004. Proceedings..

[28]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[29]  Zhou Ji,et al.  Artificial immune system (AIS) research in the last five years , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[30]  Marco Dorigo,et al.  Optimization, Learning and Natural Algorithms , 1992 .

[31]  Xin-She Yang,et al.  Firefly Algorithms for Multimodal Optimization , 2009, SAGA.

[32]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[33]  Luis Moreno,et al.  A Genetic Algorithm for Mobile Robot Localization Using Ultrasonic Sensors , 1999, J. Intell. Robotic Syst..

[34]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Yasuhiro Kobayashi,et al.  Knowledge Representation and Utilization for Optimal Route Search , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[36]  Xin-She Yang,et al.  Cuckoo Search via Lévy flights , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[37]  Thomas Stützle,et al.  Improvements on the Ant-System: Introducing the MAX-MIN Ant System , 1997, ICANNGA.

[38]  J. Bishop Stochastic searching networks , 1989 .