In ordinary reinforcement learning algorithms, a single agent learns to achieve a goal through many episodes. If a learning problem is complicated, it may take much computation time to acquire the optimal policy. Meanwhile, for optimization problems, population-based methods such as particle swarm optimization have been recognized that they are able to find rapidly the global optimal solution for multi-modal functions with wide solution space. We recently proposed reinforcement learning algorithms in which multiple agents are prepared and they learn through not only their respective experiences but also exchanging information among them. In these algorithms, it is important how to design a method of exchanging the information. This paper proposes some methods of exchanging the information based on the update equations of particle swarm optimization. The proposed algorithms using these methods are applied to a shortest path problem, and their performance is compared through numerical experiments.
[1]
Y. Kuroe,et al.
Reinforcement Learning through Interaction among Multiple Agents
,
2006,
2006 SICE-ICASE International Joint Conference.
[2]
Peter Dayan,et al.
Q-learning
,
1992,
Machine Learning.
[3]
Andrew G. Barto,et al.
Reinforcement learning
,
1998
.
[4]
Bart De Schutter,et al.
A Comprehensive Survey of Multiagent Reinforcement Learning
,
2008,
IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[5]
Mauro Birattari,et al.
Swarm Intelligence
,
2012,
Lecture Notes in Computer Science.