Swarm reinforcement learning methods for problems with continuous state-action space

We recently proposed swarm reinforcement learning methods in which multiple sets of an agent and an environment are prepared and the agents learn not only by individually performing a usual reinforcement learning method but also by exchanging information among them. Q-learning method has been used as the individual learning in the methods, and they have been applied to a problem with discrete state-action space. In the real world, however, there are many problems which are formulated as ones with continuous state-action space. This paper proposes swarm reinforcement learning methods based on an actor-critic method in order to acquire optimal policies rapidly for problems with continuous state-action space. The proposed methods are applied to a biped robot control problem, and their performance is examined through numerical experiments.

[1]  Hiroshi Shimizu,et al.  Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment , 1991, Biological Cybernetics.

[2]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[3]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[4]  Yasuaki Kuroe,et al.  Swarm reinforcement learning algorithms based on particle swarm optimization , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[5]  Yasuaki Kuroe,et al.  Swarm reinforcement learning method based on ant colony optimization , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[6]  Y. Kuroe,et al.  Reinforcement Learning through Interaction among Multiple Agents , 2006, 2006 SICE-ICASE International Joint Conference.

[7]  Toshiyuki Kondo,et al.  A Study on Designing Robot Controllers by Using Reinforcement Learning with Evolutionary State Recruitment Strategy , 2004, BioADIT.

[8]  Yasuaki Kuroe,et al.  Swarm Reinforcement Learning Method Based on an Actor-Critic Method , 2010, SEAL.

[9]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Shin Ishii,et al.  Reinforcement Learning for Rhythmic Movements Using a Neural Oscillator Network , 2004 .

[12]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.