论文信息 - Swarm reinforcement learning methods for problems with continuous state-action space

Swarm reinforcement learning methods for problems with continuous state-action space

We recently proposed swarm reinforcement learning methods in which multiple sets of an agent and an environment are prepared and the agents learn not only by individually performing a usual reinforcement learning method but also by exchanging information among them. Q-learning method has been used as the individual learning in the methods, and they have been applied to a problem with discrete state-action space. In the real world, however, there are many problems which are formulated as ones with continuous state-action space. This paper proposes swarm reinforcement learning methods based on an actor-critic method in order to acquire optimal policies rapidly for problems with continuous state-action space. The proposed methods are applied to a biped robot control problem, and their performance is examined through numerical experiments.

[1] Hiroshi Shimizu,et al. Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment , 1991, Biological Cybernetics.

[2] Mauro Birattari,et al. Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[3] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[4] Yasuaki Kuroe,et al. Swarm reinforcement learning algorithms based on particle swarm optimization , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[5] Yasuaki Kuroe,et al. Swarm reinforcement learning method based on ant colony optimization , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[6] Y. Kuroe,et al. Reinforcement Learning through Interaction among Multiple Agents , 2006, 2006 SICE-ICASE International Joint Conference.

[7] Toshiyuki Kondo,et al. A Study on Designing Robot Controllers by Using Reinforcement Learning with Evolutionary State Recruitment Strategy , 2004, BioADIT.

[8] Yasuaki Kuroe,et al. Swarm Reinforcement Learning Method Based on an Actor-Critic Method , 2010, SEAL.

[9] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .

[10] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[11] Shin Ishii,et al. Reinforcement Learning for Rhythmic Movements Using a Neural Oscillator Network , 2004 .

[12] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.