论文信息 - Rationality of reward sharing in multi-agent reinforcement learning

Rationality of reward sharing in multi-agent reinforcement learning

AbstractIn multi-agent reinforcement learning systems, it is important to share a reward among all agents. We focus on theRationality Theorem of Profit Sharing5) and analyze how to share a reward among all profit sharing agents. When an agent gets adirect reward R (R>0), anindirect reward μR (μ≥0) is given to the other agents. We have derived the necessary and sufficient condition to preserve the rationality as follows; $$\mu < \frac{{M - 1}}{{M^W (1 - (\tfrac{1}{M})^{W_o } )(n - 1)L}}$$ whereM andL are the maximum number of conflicting all rules and rational rules in the same sensory input,W andWo are the maximum episode length of adirect and anindirect-reward agents, andn is the number of agents. This theory is derived by avoiding the least desirable situation whose expected reward per an action is zero. Therefore, if we use this theorem, we can experience several efficient aspects of reward sharing. Through numerical examples, we confirm the effectiveness of this theorem.

Shigenobu Kobayashi | Kazuteru Miyazaki | K. Miyazaki | S. Kobayashi

[1] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[2] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.

[3] Sandip Sen,et al. Multiagent Coordination with Learning Classifier Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[4] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[5] Gerhard Weiss,et al. Learning to Coordinate Actions in Multi-Agent-Systems , 1993, IJCAI.

[6] John J. Grefenstette,et al. Credit assignment in rule discovery systems based on genetic algorithms , 1988, Machine Learning.

[7] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[8] J. Grefenstette. Credit Assignment in Rule Discovery Systems Based on Genetic Algorithms , 2005, Machine Learning.