Parallel reinforcement learning systems using exploration agents and dyna-Q algorithm

We propose a new strategy for parallel reinforcement learning; using this strategy, the optimal value function and policy can be constructed more quickly than by using traditional strategies. We define two types of agents: exploitation agents and exploration agents. The exploitation agents select actions mainly for the purpose of exploitation, and the exploration agents concentrate on exploration by using the extended k-certainty exploration method. These agents learn in the same environment in parallel, combine each value function periodically and execute Dyna-Q. The use of this strategy, make it possible to expect the construction of the optimal value function , and enables the exploration agents to quickly select the optimal actions. The experimental results of the mobile robot simulation showed the applicability of our method.