Deep Multi-Agent Reinforcement Learning using DNN-Weight Evolution to Optimize Supply Chain Performance

To develop a supply chain management (SCM) system that performs optimally for both each entity in the chain and the entire chain, a multi-agent reinforcement learning (MARL) technique has been developed. To solve two problems of the MARL for SCM (building a Markov decision processes for a supply chain and avoiding learning stagnation in a way similar to the “prisoner’s dilemma”), a learning management method with deep-neural-network (DNN)-weight evolution (LM-DWE) has been developed. By using a beer distribution game (BDG) as an example of a supply chain, experiments with a four-agent system were performed. Consequently, the LM-DWE successfully solved the above two problems and achieved 80.0% lower total cost than expert players of the BDG.

[1]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[2]  Hau L. Lee,et al.  The bullwhip effect in supply chains , 2015, IEEE Engineering Management Review.

[3]  D. Sterman,et al.  Misperceptions of Feedback in a Dynamic Decision Making Experiment , 1989 .

[4]  Julie A. Adams,et al.  Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 2001, AI Mag..

[5]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[6]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[7]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[8]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[9]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Jayashankar M. Swaminathan,et al.  Modeling Supply Chain Dynamics: A Multiagent Approach , 1998 .

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[13]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[16]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[17]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[18]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[19]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..