Reinforcement Learning Methods for Operations Research Applications: The Order Release Problem

An important goal in Manufacturing Planning and Control systems is to achieve short and predictable flow times, especially where high flexibility in meeting customer demand is required. Besides achieving short flow times, one should also maintain high output and due-date performance. One approach to address this problem is the use of an order release mechanism which collects all incoming orders in an order-pool and thereafter determines when to release the orders to the shop-floor. A major disadvantage of traditional order release mechanisms is their inability to consider the nonlinear relationship between resource utilization and flow times which is well known from practice and queuing theory. Therefore, we propose a novel adaptive order release mechanism which utilizes deep reinforcement learning to set release times of the orders and provide several techniques for challenging operations research problems with reinforcement learning. We use a simulation model of a two-stage flow-shop and show that our approach outperforms well-known order release mechanism.

[1]  Stephen C. Graves,et al.  An Application of Master Schedule Smoothing and Planned Lead Time Control , 2012 .

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  Ilias P. Tatsiopoulos,et al.  Lead time management , 1983 .

[4]  Loren Paul Rees,et al.  Using Neural Networks to Determine Internally-Set Due-Date Assignments for Shop Scheduling* , 1994 .

[5]  S. T. Enns,et al.  Work load responsive adjustment of planned lead times , 2004 .

[6]  B. Kingsman,et al.  Production planning systems and their applicability to make-to-order companies , 1989 .

[7]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[8]  Derya Eren Akyol,et al.  A review on evolution of production scheduling with neural networks , 2007, Comput. Ind. Eng..

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[10]  Cristovao Silva,et al.  Workload Control and Order Release: A Lean Solution for Make-to-Order Companies , 2012 .

[11]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Wenny H. M. Raaymakers,et al.  Makespan estimation in batch process industries: A comparison between regression analysis and neural networks , 2003, Eur. J. Oper. Res..

[14]  Katja Windt,et al.  Control-theoretic Analysis of the Lead Time Syndrome and its Impact on the Logistic Target Achievement , 2013 .

[15]  Chung Yee Lee,et al.  Job shop scheduling with a genetic algorithm and machine learning , 1997 .

[16]  de Ag Ton Kok,et al.  The effect of updating lead times on the performance of hierarchical planning systems , 2006 .

[17]  Yong Liu,et al.  A GA-based NN approach for makespan estimation , 2007, Appl. Math. Comput..

[18]  Luk N. Van Wassenhove,et al.  Hierarchical integration in production planning: Theory and practice , 1982 .

[19]  Cristovao Silva,et al.  Three decades of workload control research: a systematic review of the literature , 2011 .

[20]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[21]  Candace Aria Yano,et al.  Setting Planned Leadtimes in Serial Production Systems with Tardiness Costs , 1987 .

[22]  Brian G. Kingsman,et al.  A Decision Support System for Job Release in Make‐to‐order Companies , 1991 .

[23]  A. Molinder,et al.  Joint optimization of lot-sizes, safety stocks and safety lead times in an MRP system , 1997 .

[24]  Henri Pierreval,et al.  Real time selection of scheduling rules and knowledge extraction via dynamically controlled data mining , 2010 .

[25]  S. S. Ackerman Even-Flow, A Scheduling Method for Reducing Lateness in Job Shops , 1963 .

[26]  Christoph Schneeweiss,et al.  Distributed decision making--a unified approach , 2003, Eur. J. Oper. Res..

[27]  Rahul J. Patil,et al.  Using ensemble and metaheuristics learning principles with artificial neural networks to improve due date prediction performance , 2008 .

[28]  Steven A. Melnyk,et al.  Order review/release: research issues and perspectives , 1989 .

[29]  Wolfgang Bechte Theory and practice of load-oriented manufacturing control , 1988 .

[30]  Wolfgang Bechte Load-oriented manufacturing control just-in-time production for job shops , 1994 .

[31]  Adil Baykasoğlu,et al.  A simulation based approach to analyse the effects of job release on the performance of a multi-stage job-shop with processing flexibility , 2011 .

[32]  Tapas K. Das,et al.  Intelligent dynamic control policies for serial production lines , 2001 .

[33]  Klaus-Dieter Thoben,et al.  Machine learning in manufacturing: advantages, challenges, and applications , 2016 .

[34]  Siddhartha Bhattacharyya,et al.  A review of machine learning in scheduling , 1994 .

[35]  Jwm Will Bertrand,et al.  Production Control: A Structural and Design Oriented Approach , 1990 .

[36]  David L. Woodruff,et al.  CONWIP: a pull alternative to kanban , 1990 .

[37]  Shao-Chung Hsu,et al.  Due date assignment using artificial neural networks under different shop floor control strategies , 2004 .

[38]  Aslan Deniz Karaoglan,et al.  Flow time and product cost estimation by using an artificial neural network (ANN): A case study for transformer orders , 2017 .

[39]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[40]  David L. Woodruff,et al.  Production planning with load dependent lead times: an update of research , 2007, Ann. Oper. Res..

[41]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[42]  Jwm Will Bertrand,et al.  Production control and information systems for component-manufacturing shops , 1981 .

[43]  Guoqiang Peter Zhang,et al.  Avoiding Pitfalls in Neural Network Research , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[44]  Rafael A. Perez,et al.  Scheduling semiconductor wafer production: an expert system implementation , 1989, IEEE Expert.