An Approximate Dynamic Programming Approach for Job Releasing and Sequencing in a Reentrant Manufacturing Line

This paper presents the application of an approximate dynamic programming (ADP) algorithm to the problem of job releasing and sequencing of a benchmark reentrant manufacturing line (RML). The ADP approach is based on the SARSA(lambda) algorithm with linear approximation structures that are tuned through a gradient-descent approach. The optimization is performed according to a discounted cost criterion that seeks both the minimization of inventory costs and the maximization of throughput. Simulation experiments are performed by using different approximation architectures to compare the performance of optimal strategies against policies obtained with ADP. Results from these experiments showed a statistical match in performance between the optimal and the approximated policies obtained through ADP. Such results also suggest that the applicability of the ADP algorithm presented in this paper may be a promising approach for larger RML systems

[1]  P.R. Kumar Scheduling semiconductor manufacturing plants , 1994, IEEE Control Systems.

[2]  J.A. Ramirez-Hernandez,et al.  Optimal Job Releasing and Sequencing for a Reentrant Manufacturing Line with Finite Capacity Buffers , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[3]  J.A. Ramirez-Hernandez,et al.  A Case Study in Scheduling Reentrant Manufacturing Lines: Optimal and Simulation-Based Approaches , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[4]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[5]  Guy Pujolle,et al.  Introduction to queueing networks , 1987 .

[6]  S.C.H. Lu,et al.  Efficient scheduling policies to reduce mean and variance of cycle-time in semiconductor manufacturing plants , 1994 .

[7]  Benjamin Van Roy Learning and value function approximation in complex decision processes , 1998 .

[8]  J.-B. Suk,et al.  Optimal control of a storage-retrieval queuing system , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[9]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[10]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[11]  M. Venables Small is beautiful [small, low volume semiconductor manufacturing plants] , 2005 .

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[13]  Reha Uzsoy,et al.  A review of production planning and scheduling models in the semiconductor industry , 1994 .

[14]  P. R. Kumar,et al.  Re-entrant lines , 1993, Queueing Syst. Theory Appl..

[15]  Spyros A. Reveliotis,et al.  Relative value function approximation for the capacitated re-entrant line scheduling problem , 2004, IEEE Transactions on Automation Science and Engineering.

[16]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Yu Tian,et al.  Reinforcement learning approach to re-entrant manufacturing system scheduling , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[19]  Roland Sturm,et al.  Simulation-based evaluation of the ramp-up behavior of waferfabs , 2003, Advanced Semiconductor Manufacturing Conference and Workshop, 2003 IEEEI/SEMI.

[20]  Gary L. Hogg,et al.  Workload control in the semiconductor industry , 2002 .

[21]  Sean P. Meyn,et al.  Value iteration and optimization of multiclass queueing networks , 1999, Queueing Syst. Theory Appl..

[22]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[23]  John N. Tsitsiklis,et al.  The complexity of optimal queueing network control , 1994, Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory.

[24]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[25]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .

[26]  Sunil Kumar,et al.  Queueing network models in the design and analysis of semiconductor wafer fabs , 2001, IEEE Trans. Robotics Autom..

[27]  Lawrence M. Wein,et al.  Scheduling semiconductor wafer fabrication , 1988 .

[28]  J. Ben Atkinson,et al.  An Introduction to Queueing Networks , 1988 .

[29]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[30]  Charles R. McLean,et al.  A framework for standard modular simulation in semiconductor wafer fabrication systems , 2005, Proceedings of the Winter Simulation Conference, 2005..

[31]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[32]  Steven A. Lippman,et al.  Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[33]  Randall P. Sadowski,et al.  Simulation with Arena , 1998 .