论文信息 - Flatland-RL : Multi-Agent Reinforcement Learning on Trains - 字舞流文

Flatland-RL : Multi-Agent Reinforcement Learning on Trains

Efficient automated scheduling of trains remains a major challenge for modern railway systems. The underlying vehicle rescheduling problem (VRSP) has been a major focus of Operations Research (OR) since decades. Traditional approaches use complex simulators to study VRSP, where experimenting with a broad range of novel ideas is time consuming and has a huge computational overhead. In this paper, we introduce a two-dimensional simplified grid environment called "Flatland" that allows for faster experimentation. Flatland does not only reduce the complexity of the full physical simulation, but also provides an easy-to-use interface to test novel approaches for the VRSP, such as Reinforcement Learning (RL) and Imitation Learning (IL). In order to probe the potential of Machine Learning (ML) research on Flatland, we (1) ran a first series of RL and IL experiments and (2) design and executed a public Benchmark at NeurIPS 2020 to engage a large community of researchers to work on this problem. Our own experimental results, on the one hand, demonstrate that ML has potential in solving the VRSP on Flatland. On the other hand, we identify key topics that need further research. Overall, the Flatland environment has proven to be a robust and valuable framework to investigate the VRSP for railway networks. Our experiments provide a good starting point for further research and for the participants of the NeurIPS 2020 Flatland Benchmark. All of these efforts together have the potential to have a substantial impact on shaping the mobility of the future.

Erik Nygren | Guillaume Sartoretti | Jeremy Watson | Manuel Schneider | Christian Eichenberger | Giacomo Spigler | Christian Scheller | Christian Baumberger | Sharada Mohanty | Irene Sturm | Florian Laurent | Nilabha Bhattacharya | Adrian Egli | Gereon Vienken | S. Mohanty | Erik Nygren | A. Egli | G. Spigler | Christian Eichenberger | Guillaume Sartoretti | C. Scheller | J. Watson | Manuel Schneider | Florian Laurent | Nilabha Bhattacharya | Christian Baumberger | Gereon Vienken | Irene Sturm

[1] David M. Ryan,et al. An Integer Programming Approach to the Vehicle Scheduling Problem , 1976 .

[2] Razvan Pascanu,et al. Stabilizing Transformers for Reinforcement Learning , 2019, ICML.

[3] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[4] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[5] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[6] Marijan Žura,et al. Reinforcement learning approach for train rescheduling on a single-track railway , 2016 .

[7] Li Zipeng,et al. A Review for Aircraft Landing Problem , 2018 .

[8] Albert Jones,et al. Survey of Job Shop Scheduling Techniques , 1999 .

[9] Howie Choset,et al. PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning , 2018, IEEE Robotics and Automation Letters.

[10] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[11] Yoshua Bengio,et al. Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information , 2018, INFORMS J. Comput..

[12] Denis Borenstein,et al. The vehicle rescheduling problem: Model and algorithms , 2007, Networks.

[13] Sam Devlin,et al. The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition , 2019, ArXiv.

[14] Jamal Shahrabi,et al. A reinforcement learning approach to parameter estimation in dynamic job shop scheduling , 2017, Comput. Ind. Eng..

[15] Jean-Yves Potvin,et al. A parallel route building algorithm for the vehicle routing and scheduling problem with time windows , 1993 .

[16] Fei Sha,et al. Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[17] Qing Wang,et al. Exponentially Weighted Imitation Learning for Batched Historical Data , 2018, NeurIPS.

[18] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[19] Lawrence Bodin,et al. Classification in vehicle routing and scheduling , 1981, Networks.

[20] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[21] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[22] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.

[23] Julian Togelius,et al. Pommerman: A Multi-Agent Playground , 2018, AIIDE Workshops.

[24] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.