Swarm reinforcement learning method based on ant colony optimization

In ordinary reinforcement learning methods, a single agent learns to achieve a goal through many episodes. Since the agent essentially learns by trial and error, it takes much computation time to acquire an optimal policy especially for complicated learning problems. Meanwhile, for optimization problems, population-based methods such as particle swarm optimization have been recognized that they are able to find rapidly the global optimal solution for multi-modal functions with wide solution space. We recently proposed swarm reinforcement learning methods in which multiple agents are prepared and they learn through not only their respective experiences but also exchanging information among them. In these methods, it is important how to design a method of exchanging the information. In this paper, we propose a swarm reinforcement learning method based on ant colony optimization, which is an optimization method inspired from behavior of real ants using trail pheromones, in order to acquire the optimal policy rapidly even for complicated reinforcement learning problems. In the proposed method, the agents exchange their information through Pheromone-Q values which we define so as to make them play the same role as the trail pheromones. The proposed method is applied to shortest path problems, and its performance is demonstrated through numerical experiments.

[1]  Y. Kuroe,et al.  Reinforcement Learning through Interaction among Multiple Agents , 2006, 2006 SICE-ICASE International Joint Conference.

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[5]  Marco Dorigo Ant colony optimization , 2004, Scholarpedia.

[6]  Yasuaki Kuroe,et al.  Swarm reinforcement learning algorithms based on particle swarm optimization , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.