Deep reinforcement learning for semiconductor production scheduling

Despite producing tremendous success stories by identifying cat videos [1] or solving computer as well as board games [2], [3], the adoption of deep learning in the semiconductor industry is moderatre. In this paper, we apply Google DeepMind's Deep Q Network (DQN) agent algorithm for Reinforcement Learning (RL) to semiconductor production scheduling. In an RL environment several cooperative DQN agents, which utilize deep neural networks, are trained with flexible user-defined objectives. We show benchmarks comparing standard dispatching heuristics with the DQN agents in an abstract frontend-of-line semiconduc­tor production facility. Results are promising and show that DQN agents optimize production autonomously for different targets.

[1]  Lenz Belzner,et al.  Optimization of global production scheduling with deep reinforcement learning , 2018 .

[2]  Wilfried Brauer,et al.  Multi-machine scheduling-a multi-agent learning approach , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[3]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[4]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[5]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Jie Wang,et al.  Optimized Adaptive Scheduling of a Manufacturing Process System with Multi-skill Workforce and Multiple Machine Types: An Ontology-based, Multi-agent Reinforcement Learning Approach , 2016 .

[10]  Martin A. Riedmiller,et al.  A Neural Reinforcement Learning Approach to Learn Local Dispatching Policies in Production Scheduling , 1999, IJCAI.

[11]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Tapas K. Das,et al.  A multi-agent reinforcement learning approach to obtaining dynamic control policies for stochastic lot scheduling problem , 2005, Simul. Model. Pract. Theory.

[13]  Sridhar Mahadevan,et al.  Optimizing Production Manufacturing Using Reinforcement Learning , 1998, FLAIRS.

[14]  Florin Pop,et al.  New scheduling approach using reinforcement learning for heterogeneous distributed systems , 2017, J. Parallel Distributed Comput..

[15]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[16]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[17]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[18]  Sanja Petrovic,et al.  SURVEY OF DYNAMIC SCHEDULING IN MANUFACTURING SYSTEMS , 2006 .

[19]  Martin A. Riedmiller,et al.  Scaling Adaptive Agent-Based Reactive Job-Shop Scheduling to Large-Scale Problems , 2007, 2007 IEEE Symposium on Computational Intelligence in Scheduling.

[20]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.