论文信息 - Reinforcement Learning in Large Multi-agent Systems

Reinforcement Learning in Large Multi-agent Systems

Enabling reinforcement learning to be eective in large-scale multi-agent Markov Decisions Problems is a challenging task. To address this problem we propose a multi-agent variant of Q-learning: “Q Updates with Immediate Counterfactual Rewards-learning” (QUICR-learning). Given a global reward function over all agents that the large-scale system is trying to maximize, QUICR-learning breaks down the global reward into many agent-specific rewards that have the following two properties: 1) agents maximizing their agentspecific rewards tend to maximize the global reward, 2) an agent’s action has a large influence on its agent-specific reward, allowing it to learn quickly. Each agent then uses standard Q-learning type updates to form a policy to maximize the agent-specific rewards. Results on multi-agent grid-world problems over two topologies, show that QUICRlearning can be eective with hundreds of agents and can achieve up to 300% improvements in performance over both conventional and local Q-learning in the largest tested systems.

Kagan Tumer | Adrian Agogino | A. Agogino | Kagan Tumer

[1] Kagan Tumer,et al. Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[2] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[3] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[6] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[7] Luca Maria Gambardella,et al. Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[8] Kurt Konolige,et al. Centibots: Very Large Scale Distributed Robotic Teams , 2004, AAAI.

[9] Maja J. Mataric,et al. Adaptive division of labor in large-scale minimalist multi-robot systems , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).