Reinforcement Learning in Large Multi-agent Systems

Enabling reinforcement learning to be eective in large-scale multi-agent Markov Decisions Problems is a challenging task. To address this problem we propose a multi-agent variant of Q-learning: “Q Updates with Immediate Counterfactual Rewards-learning” (QUICR-learning). Given a global reward function over all agents that the large-scale system is trying to maximize, QUICR-learning breaks down the global reward into many agent-specific rewards that have the following two properties: 1) agents maximizing their agentspecific rewards tend to maximize the global reward, 2) an agent’s action has a large influence on its agent-specific reward, allowing it to learn quickly. Each agent then uses standard Q-learning type updates to form a policy to maximize the agent-specific rewards. Results on multi-agent grid-world problems over two topologies, show that QUICRlearning can be eective with hundreds of agents and can achieve up to 300% improvements in performance over both conventional and local Q-learning in the largest tested systems.

[1]  Kagan Tumer,et al.  Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[2]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[3]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[6]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[7]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[8]  Kurt Konolige,et al.  Centibots: Very Large Scale Distributed Robotic Teams , 2004, AAAI.

[9]  Maja J. Mataric,et al.  Adaptive division of labor in large-scale minimalist multi-robot systems , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).