Proposal of Detour Path Suppression Method in PS Reinforcement Learning and Its Application to Altruistic Multi-agent Environment
暂无分享,去创建一个
Profit Sharing is well known as a kind of reinforcement learning. In PS method, a reward is generally distributed with a geometrically decreasing function, and the common ratio of the function is called a discount rate. A large discount rate increases the learning speed, but a non-optimal policy may be learned. On the other hand, a small discount rate improves the performance of the policy, but the learning may not proceed smoothly due to the shallow learning depth. In this paper, in order to cope with these problems, we propose a method that reinforces detour paths and a non-detour path with different discount rates, respectively. Finally, this method is applied to an altruistic multi-agent environment to confirm its effectiveness.
[1] Kazuteru Miyazaki,et al. A Study of an Indirect Reward on Multi-agent Environments , 2016, BICA.
[2] Shoji Tatsumi,et al. About the Reinforcement Function for Profit Sharing , 2004 .
[3] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.