论文信息 - Multi-Agent Path Finding via Tree LSTM

Multi-Agent Path Finding via Tree LSTM

In recent years, Multi-Agent Path Finding (MAPF) has attracted attention from the fields of both Operations Research (OR) and Reinforcement Learning (RL). However, in the 2021 Flatland3 Challenge, a competition on MAPF, the best RL method scored only 27.9, far less than the best OR method. This paper proposes a new RL solution to Flatland3 Challenge, which scores 125.3, several times higher than the best RL solution before. We creatively apply a novel network architecture, TreeLSTM, to MAPF in our solution. Together with several other RL techniques, including reward shaping, multiple-phase training, and centralized control, our solution is comparable to the top 2-3 OR methods.

Qimai Li | Kunjie Zhang | Yuhao Jiang | Jiaxin Chen | Xiaolong Zhu

[1] Sven Koenig,et al. Optimal and Bounded-Suboptimal Multi-Agent Motion Planning , 2021, SOCS.

[2] Peter J. Stuckey,et al. Scalable Rail Planning and Replanning: Winning the 2020 Flatland Challenge , 2021, ICAPS.

[3] S. Mohanty,et al. Flatland Competition 2020: MAPF and MARL for Efficient Train Coordination on a Grid World , 2021, NeurIPS.

[4] Erik Nygren,et al. Flatland-RL : Multi-Agent Reinforcement Learning on Trains , 2020, ArXiv.

[5] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[6] B. Nebel. On the Computational Complexity of Multi-Agent Pathfinding on Directed Graphs , 2019, ICAPS.

[7] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[8] Roman Barták,et al. Online Multi-Agent Pathfinding , 2019, AAAI.

[9] Sven Koenig,et al. Multi-Agent Path Finding with Deadlines: Preliminary Results , 2018, IJCAI.

[10] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[11] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[13] Sven Koenig,et al. Multi-Agent Path Finding with Delay Probabilities , 2016, AAAI.

[14] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[15] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.