In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent RL framework, and present a hierarchical multiagent RL algorithm called Cooperative HRL. The fundamental property of our approach is that the use of hierarchy allows agents to learn coordination faster by sharing information at the level of subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the performance of the Cooperative HRL algorithm using a fouragent automated guided vehicle (AGV) scheduling problem. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend our multi-agent HRL framework to include communication decisions and present a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. We demonstrate the eciency of this algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.
[1]
Thomas G. Dietterich.
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
,
1999,
J. Artif. Intell. Res..
[2]
Doina Precup,et al.
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
,
1999,
Artif. Intell..
[3]
Sridhar Mahadevan,et al.
Learning to Take Concurrent Actions
,
2002,
NIPS.
[4]
Sridhar Mahadevan,et al.
Hierarchical multi-agent reinforcement learning
,
2001,
AGENTS '01.
[5]
Martin L. Puterman,et al.
Markov Decision Processes: Discrete Stochastic Dynamic Programming
,
1994
.
[6]
Craig Boutilier,et al.
Sequential Optimality and Coordination in Multiagent Systems
,
1999,
IJCAI.
[7]
Ronald E. Parr,et al.
Hierarchical control and learning for markov decision processes
,
1998
.