Intelligent dynamic control policies for serial production lines

Heuristic production control policies such as CONWIP, kanban, and other hybrid policies have been in use for years as better alternatives to MRP-based push control policies. It is a fact that these policies, although efficient, are far from optimal. Our goal is to develop a methodology that, for a given system, finds a dynamic control policy via intelligent agents. Such a policy while achieving the productivity (i.e., demand service rate) goal of the system will optimize a cost/reward function based on the WIP inventory. To achieve this goal we applied a simulation-based optimization technique called Reinforcement Learning (RL) on a four-station serial line. The control policy attained by the application of a RL algorithm was compared with the other existing policies on the basis of total average WIP and average cost of WIP. We also develop a heuristic control policy in light of our experience gained from a close examination of the policies obtained by the RL algorithm. This heuristic policy named Behavior-Based Control (BBC), although placed second to the RL policy, proved to be a more efficient and leaner control policy than most of the existing policies in the literature. The performance of the BBC policy was found to be comparable to the Extended Kanban Control System (EKCS), which as per our experimentation, turned out to be the best of the existing policies. The numerical results used for comparison purposes were obtained from a four-station serial line with two different (constant and Poisson) demand arrival processes.

[1]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[2]  Kut C. So,et al.  Allocating buffer storages in a pull system , 1988 .

[3]  Yves Dallery,et al.  Extended kanban control system: combining kanban and base stock , 2000 .

[4]  John A. Muckstadt,et al.  A comparison of alternative kanban control mechanisms. II. Experimental results , 1995 .

[5]  David L. Woodruff,et al.  CONWIP: a pull alternative to kanban , 1990 .

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Suresh P. Sethi,et al.  Hierarchical Production Control in a Stochastic Manufacturing System with Long-Run Average Cost , 1997 .

[8]  Stanley B. Gershwin,et al.  Manufacturing Systems Engineering , 1993 .

[9]  John A. Muckstadt,et al.  A comparison of alternative kanban control mechanisms. I. Background and structural results , 1995 .

[10]  Sridhar Mahadevan,et al.  Optimizing Production Manufacturing Using Reinforcement Learning , 1998, FLAIRS.

[11]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[12]  Qing Zhang,et al.  Hierarchical Decision Making in Stochastic Manufacturing Systems , 1994 .

[13]  Vivek S. Borkar,et al.  Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..

[14]  T. Tabe,et al.  Analysis of production ordering quantities and inventory variations in a multi-stage production ordering system , 1980 .

[15]  Minghe Sun,et al.  Determining buffer location and size in production lines using tabu search , 1998, Eur. J. Oper. Res..

[16]  Y. Sugimori,et al.  Toyota production system and Kanban system Materialization of just-in-time and respect-for-human system , 1977 .

[17]  S. Gershwin,et al.  PRODUCTION CONTROL FOR A TANDEM TWO-MACHINE SYSTEM , 1993 .

[18]  Lawrence M. Wein,et al.  Optimal Control of a Two-Station Tandem Production/Inventory System , 1994, Oper. Res..

[19]  Charles R. Standridge,et al.  Modeling and Analysis of Manufacturing Systems , 1993 .

[20]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[21]  Sudeep Sarkar,et al.  Optimal preventive maintenance in a production inventory system , 1999 .

[22]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[23]  Abhijit Gosavi,et al.  An algorithm for solving semi-markov decision problems using reinforcement learning: convergence analysis and numerical results , 1999 .

[24]  Asbjoern M. Bonvik,et al.  A comparison of production-line control mechanisms , 1997 .

[25]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[26]  Maria Di Mascolo,et al.  On the design of generalized kanban control systems , 1995 .

[27]  S. Mahadevan,et al.  Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .