Application of Reinforcement Learning for the Generation of an Assembly Plant Entry Control Policy

The generation of an entry control policy for an assembly plant using a reinforcement learning agent is investigated. The assembly plan studied consists of ten workstations and produces three types of products. The objective of the entry control policy is to produce a given production mix within a planning horizon, while following a given production mix. Due to the large state space, a function approximator, based on a neural network, is used to model the long-term reward function. The schedules generated by the trained agent are compared to those produced by a deterministic heuristic control policy that has been developed for this assembly plant. Simulation results show that the reinforcement learning agent produces production plans that achieve better productivity than the heuristic controller under tight planning horizons, generating sub-optimal yet acceptable production mix balance.

[1]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  Mehmet Emin Aydin,et al.  Dynamic job-shop scheduling using reinforcement learning agents , 2000, Robotics Auton. Syst..

[5]  Zoe Doulgeri,et al.  Effect of workstation loading on the objective of the systems’s entry policy in FMS , 2003 .

[7]  Yi-Chi Wang,et al.  Application of reinforcement learning for agent-based production scheduling , 2005, Eng. Appl. Artif. Intell..

[8]  Huajie Liu,et al.  Dispatching rule selection using artificial neural networks for dynamic planning and scheduling , 1996, J. Intell. Manuf..

[9]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[10]  Thomas E. Morton,et al.  Heuristic scheduling systems : with applications to production systems and project management , 1993 .

[11]  Robert J. Graves Hierarchical scheduling approach in flexible assembly systems , 1987, Proceedings. 1987 IEEE International Conference on Robotics and Automation.

[12]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[13]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[14]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[15]  Stanley B. Gershwin,et al.  Performance of hierarchical production scheduling policy , 1984 .

[16]  K. D. Tocher,et al.  The art of simulation , 1967 .