On the rationality of profit sharing in multi-agent reinforcement learning

Reinforcement learning is a kind of machine learning. It aims to adapt an agent to an unknown environment according to rewards. Traditionally, from a theoretical point of view, many reinforcement learning systems assume that the environment has Markovian properties. However, it is important to treat non-Markovian environments in multi-agent reinforcement learning systems. The authors use Profit Sharing (PS) as a reinforcement learning system and discuss the rationality of PS in multi-agent environments. In particular, we classify non-Markovian environments and discuss how to share a reward among reinforcement learning agents. Through a crane control problem, we confirm the effectiveness of PS in multi-agent environments.

[1]  Shigenobu Kobayashi,et al.  Rationality of Reward Sharing in Multi-agent Reinforcement Learning , 1999, PRIMA.

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.