Automata Guided Semi-Decentralized Multi-Agent Reinforcement Learning

This paper investigates the problem of deploying a multi-robot team to satisfy a syntactically co-safe Truncated Linear Temporal Logic (scTLTL) task specification via multi-agent Reinforcement Learning (MARL). Due to the heterogeneous agents considered here, typical approaches cannot avoid the task assignment problem, which is inherently difficult and can sacrifice optimality (e.g., shortest path) through manual manipulation. scTLTL is exploited here to eliminate the task assignment as part of the learning process. MARL usually requires some direct or indirect coordination among agents to promote convergence and a tracker is needed to track the progress of satisfying the scTLTL specification given its temporal nature. We use the Finite State Automaton (FSA) to address these two issues. An FSA augmented Markov Decision Process (MDP) is constructed with each agent sharing the FSA state carrying the global information. Moreover, a metric called robustness degree is employed to replace the Boolean semantics and quantify the reward of gradually satisfying the scTLTL. Consequently, a language guided semi-decentralized Q-learning algorithm is proposed to maximize the return over the FSA augmented MDP. Simulation results demonstrate the effectiveness of the semi-decentralized multi-agent Q-learning while the complexity is significantly reduced.