In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent RL framework, and present a hierarchical multiagent RL algorithm called Cooperative HRL. The fundamental property of our approach is that the use of hierarchy allows agents to learn coordination faster by sharing information at the level of subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the performance of the Cooperative HRL algorithm using a fouragent automated guided vehicle (AGV) scheduling problem. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend our multi-agent HRL framework to include communication decisions and present a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. We demonstrate the efficiency of this algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.
[1]
Suchi Saria,et al.
Probabilistic Plan Recognition in Multiagent Systems
,
2004,
ICAPS.
[2]
L. Sucar,et al.
Markov Decision Processes
,
2004,
Encyclopedia of Machine Learning and Data Mining.
[3]
Sridhar Mahadevan,et al.
Hierarchical multi-agent reinforcement learning
,
2001,
AGENTS '01.
[4]
Doina Precup,et al.
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
,
1999,
Artif. Intell..
[5]
Craig Boutilier,et al.
Sequential Optimality and Coordination in Multiagent Systems
,
1999,
IJCAI.
[6]
Thomas G. Dietterich.
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
,
1999,
J. Artif. Intell. Res..
[7]
Sridhar Mahadevan,et al.
Learning to Take Concurrent Actions
,
2002,
NIPS.
[8]
Ronald E. Parr,et al.
Hierarchical control and learning for markov decision processes
,
1998
.