We present preliminary results from our sixth placed entry to the Flatland international competition for train rescheduling, including two improvements for optimized reinforcement learning (RL) training efficiency, and two hypotheses with respect to the prospect of deep RL for complex real-world control tasks: first, that current state of the art policy gradient methods seem inappropriate in the domain of high-consequence environments; second, that learning explicit communication actions (an emerging machine-to-machine language, so to speak) might offer a remedy. These hypotheses need to be confirmed by future work. If confirmed, they hold promises with respect to optimizing highly efficient logistics ecosystems like the Swiss Federal Railways railway network.
[1]
Benjamin Bruno Meier,et al.
Deep Learning in the Wild
,
2018,
ANNPR.
[2]
Alex Graves,et al.
Asynchronous Methods for Deep Reinforcement Learning
,
2016,
ICML.
[3]
Martin Braschler,et al.
Applied data science in Europe : challenges for academia in keeping up with a highly demanded topic
,
2013
.
[4]
Daniele Molinari,et al.
Microscopic Traffic Simulation by Cooperative Multi-agent Deep Reinforcement Learning
,
2019,
AAMAS.