Parallel Reinforcement Learning Systems Using Exploration Agents

We propose a new strategy for parallel reinforcement learning ; using this strategy, the optimal value function and policy can be constructed more quickly than by using traditional strategies. We define two types of agents : the exploitation agents and the exploration agents. The exploitation agents select actions mainly for exploitation, and the exploration agents concentrate on exploration using the extended k-certainty exploration method. These agents learn in the same environment in parallel and combine each value function periodically. By using this strategy, the construction of the optimal value function is expected, and the optimal actions can be selected by the exploitation agents quickly. The experimental results of the mobile robot simulation showed the availability of our method.