Dynamic Dispatching for Large-Scale Heterogeneous Fleet via Multi-agent Deep Reinforcement Learning

Dynamic dispatching is one of the core problems for operation optimization in traditional industries such as mining, as it is about how to smartly allocate the right resources to the right place at the right time. Conventionally, the industry relies on heuristics or even human intuitions which are often short-sighted and sub-optimal solutions. Leveraging the power of AI and Internet of Things (IoT), data-driven automation is reshaping this area. However, facing its own challenges such as large-scale and heterogenous trucks running in a highly dynamic environment, it can barely adopt methods developed in other domains (e.g., ride-sharing). In this paper, we propose a novel Deep Reinforcement Learning approach to solve the dynamic dispatching problem in mining. We first develop an event-based mining simulator with parameters calibrated in real mines. Then we propose an experience-sharing Deep Q Network with a novel abstract state/action representation to learn memories from heterogeneous agents altogether and realizes learning in a centralized way. We demonstrate that the proposed methods significantly outperform the most widely adopted approaches in the industry by $5.56\%$ in terms of productivity. The proposed approach has great potential in a broader range of industries (e.g., manufacturing, logistics) which have a large-scale of heterogenous equipment working in a highly dynamic environment, as a general framework for dynamic resource allocation.

[1]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[2]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[3]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[4]  Behind the Mining Productivity Upswing: Technology-Enabled Transformation , 2018 .

[5]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[6]  Mukani Moyo,et al.  Productivity in mining operations: Reversing the downward trend , 2015 .

[7]  Chetan Gupta,et al.  Long Short-Term Memory Network for Remaining Useful Life estimation , 2017, 2017 IEEE International Conference on Prognostics and Health Management (ICPHM).

[8]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[9]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[10]  Chetan Gupta,et al.  Trace Norm Generative Adversarial Networks for Sensor Generation and Feature Extraction , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Ahmed K. Farahat,et al.  Generative Adversarial Networks for Failure Prediction , 2019, ECML/PKDD.

[12]  Diego M. Silva,et al.  A Practical Approach to Truck Dispatch for Open Pit Mines , 2011 .

[13]  Zhe Xu,et al.  Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning , 2018, KDD.

[14]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15]  Oliver Rose,et al.  The shortest processing time first (SPTF) dispatch rule and some variants in semiconductor manufacturing , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).

[16]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[17]  Chetan Gupta,et al.  Dispatch with Confidence: Integration of Machine Learning, Optimization and Simulation for Open Pit Mines , 2017, KDD.

[18]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[19]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[20]  Chetan Gupta,et al.  Discriminant Generative Adversarial Networks with its Application to Equipment Health Classification , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  M. J. F. Souza,et al.  A hybrid heuristic algorithm for the open-pit-mining operational planning problem , 2010, Eur. J. Oper. Res..

[22]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[23]  Louis Caccetta,et al.  Match factor for heterogeneous truck and loader fleets , 2007 .

[24]  Quan Zhou,et al.  Improving fleet management in mines: The benefit of heterogeneous match factor , 2017, Eur. J. Oper. Res..

[25]  Susumu Serita,et al.  Manufacturing Dispatching using Reinforcement and Transfer Learning , 2019, ECML/PKDD.