Parallel reinforcement learning systems using exploration agents and dyna-Q algorithm
暂无分享,去创建一个
We propose a new strategy for parallel reinforcement learning; using this strategy, the optimal value function and policy can be constructed more quickly than by using traditional strategies. We define two types of agents: exploitation agents and exploration agents. The exploitation agents select actions mainly for the purpose of exploitation, and the exploration agents concentrate on exploration by using the extended k-certainty exploration method. These agents learn in the same environment in parallel, combine each value function periodically and execute Dyna-Q. The use of this strategy, make it possible to expect the construction of the optimal value function , and enables the exploration agents to quickly select the optimal actions. The experimental results of the mobile robot simulation showed the applicability of our method.
[1] Shigenobu Kobayashi,et al. k-Certainty Exploration Method: An Action Selector to Identify the Environment in Reinforcement Learning , 1997, Artif. Intell..
[2] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.
[3] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[4] Peter Dayan,et al. Q-learning , 1992, Machine Learning.