Distributed Reinforcement Learning for Optimizing Resource Allocation in Autonomous Logistics Processes

This paper examines multi-agent coordination for resource allocation tasks in autonomous logistics processes. It identifies requirements for the learning of optimal behavior in a multi-agent setting. Based on a real-world logistics application, the paper distinguishes between single resource allocation by independent agents and joint activities by agent teams. For both cases it introduces adaptations of the Q-learning algorithm and evaluates their convergence as well as their scalability for large scenarios. The results demonstrate that the known conditions for the convergence of multi-agent reinforcement learning are insufficient. This leads to the identification of an additional requirement for convergence in this paper.