Proposal of Detour Path Suppression Method in PS Reinforcement Learning and Its Application to Altruistic Multi-agent Environment

Profit Sharing is well known as a kind of reinforcement learning. In PS method, a reward is generally distributed with a geometrically decreasing function, and the common ratio of the function is called a discount rate. A large discount rate increases the learning speed, but a non-optimal policy may be learned. On the other hand, a small discount rate improves the performance of the policy, but the learning may not proceed smoothly due to the shallow learning depth. In this paper, in order to cope with these problems, we propose a method that reinforces detour paths and a non-detour path with different discount rates, respectively. Finally, this method is applied to an altruistic multi-agent environment to confirm its effectiveness.