Actor Based Simulation for Closed Loop Control of Supply Chain using Reinforcement Learning

Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and robotics, but has rarely been used to manage operations of business-critical systems such as supply chains. A key aspect of using RL in the real world is to train the agent before deployment, so as to minimise experimentation in live operation. While this is feasible for online gameplay (where the rules of the game are known) and robotics (where the dynamics are predictable), it is much more difficult for complex systems due to associated complexities, such as uncertainty, adaptability and emergent behaviour. In this paper, we describe a framework for effective integration of a reinforcement learning controller with an actor-based simulation of the complex networked system, in order to enable deployment of the RL agent in the real system with minimal further tuning.