Decentralized coordination via task decomposition and reward shaping

In this work, we introduce a method for decentralized coordination in cooperative multiagent multi-task problems where the subtasks and agents are homogeneous. Using the method proposed, the agents cooperate at the high level task selection using the knowledge they gather by learning subtasks. We introduce a subtask selection method for single agent multi-task MDPs and we extend the work to multiagent multi-task MDPs by using reward shaping at the subtask level to coordinate the agents. Our results on a multi-rover problem show that agents which use the combination of task decomposition and subtask based difference rewards result in significant improvement both in terms of learning speed, and converged policies.