Control of a re-entrant line manufacturing model with a reinforcement learning approach

This paper presents the application of a reinforcement learning (RL) approach for the near-optimal control of a re-entrant line manufacturing (RLM) model. The RL approach utilizes an algorithm based on a gradient-descent TD(lambda) method to obtain both estimates of the optimal cost function and the control actions. Numerical experiments demonstrated the efficacy of the approach in estimating optimal actions by showing close approximations in performance w.r.t. the optimal strategy. Generalizations of the RL approach may have the advantage of scaling appropriately for RLM models with different dimensions in the state and action spaces.

[1]  E. Fernandez,et al.  An Approximate Dynamic Programming Approach for Job Releasing and Sequencing in a Reentrant Manufacturing Line , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[2]  J. Ben Atkinson,et al.  An Introduction to Queueing Networks , 1988 .

[3]  Randall P. Sadowski,et al.  Simulation with Arena , 1998 .

[4]  J.A. Ramirez-Hernandez,et al.  A Case Study in Scheduling Reentrant Manufacturing Lines: Optimal and Simulation-Based Approaches , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[5]  John N. Tsitsiklis,et al.  The complexity of optimal queueing network control , 1994, Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory.

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[7]  Martin L. Puterman,et al.  Discounted Markov Decision Problems , 2008 .

[8]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[9]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10]  P. R. Kumar,et al.  Re-entrant lines , 1993, Queueing Syst. Theory Appl..

[11]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Sunil Kumar,et al.  Queueing network models in the design and analysis of semiconductor wafer fabs , 2001, IEEE Trans. Robotics Autom..

[13]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[14]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[15]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[16]  Reha Uzsoy,et al.  A review of production planning and scheduling models in the semiconductor industry , 1994 .

[17]  J.A. Ramirez-Hernandez,et al.  Optimal Job Releasing and Sequencing for a Reentrant Manufacturing Line with Finite Capacity Buffers , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[18]  Xue-wen Chen,et al.  Margin-based wrapper methods for gene identification using microarray , 2006, Neurocomputing.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Lawrence M. Wein,et al.  Scheduling semiconductor wafer fabrication , 1988 .

[21]  J.-B. Suk,et al.  Optimal control of a storage-retrieval queuing system , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[22]  David Casasent,et al.  Feature reduction and morphological processing for hyperspectral image data. , 2004, Applied optics.

[23]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[24]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Charles R. McLean,et al.  A framework for standard modular simulation in semiconductor wafer fabrication systems , 2005, Proceedings of the Winter Simulation Conference, 2005..

[26]  Bor-Chen Kuo,et al.  A covariance estimator for small sample size classification problems and its application to feature extraction , 2002, IEEE Trans. Geosci. Remote. Sens..

[27]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[28]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.