ADAPTIVE REACTIVE JOB-SHOP SCHEDULING WITH REINFORCEMENT LEARNING AGENTS

Traditional approaches to solving job-shop scheduling problems assume full knowledge of the problem and search for a centralized solution for a single problem instance. Finding optimal solutions, however, requires an enormous computational effort, which becomes critical for large problem instance sizes and, in particular, in situati ons where frequent changes in the environment occur. In this article, we adopt an alternative view on production scheduling problems by modelling them as multi-agent reinforcement learning problems. In fact, we interpret jobshop scheduling problems as sequential decision processes and attach to each resource an adaptive agent that makes its job dispatching decisions independently of the other agents and improves its dispatching behavior by trial and error employing a reinforcement learning algorithm. The utilization of concurrently and independently learning ag ents requires special care in the design of the reinforcement learning alg orithm to be applied. Therefore, we develop a novel multi-agent learning algorithm, that combines data-effi cient batch-mode reinforcement learning, neural network-based value function approximation, and the use of an optimistic inter-agent coordination scheme. The evaluation of our learning framework focuses on numerous established Operations Research benchmark problems and shows that our approach can very well compete with alternative solution methods.

[1]  L. Goddard,et al.  Operations Research (OR) , 2007 .

[2]  S. S. Panwalkar,et al.  A Survey of Scheduling Rules , 1977, Oper. Res..

[3]  P. Ow,et al.  Filtered beam search in scheduling , 1988 .

[4]  Egon Balas,et al.  The Shifting Bottleneck Procedure for Job Shop Scheduling , 1988 .

[5]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[6]  Jacek Blazewicz,et al.  Scheduling in Computer and Manufacturing Systems , 1990 .

[7]  William J. Cook,et al.  A Computational Study of the Job-Shop Scheduling Problem , 1991, INFORMS Journal on Computing.

[8]  Jan Karel Lenstra,et al.  Job Shop Scheduling by Simulated Annealing , 1992, Oper. Res..

[9]  J. C. Bean Genetics and random keys for sequencing amd optimization , 1993 .

[10]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[11]  James C. Bean,et al.  Genetic Algorithms and Random Keys for Sequencing and Optimization , 1994, INFORMS J. Comput..

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[14]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[15]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[16]  Katia P. Sycara,et al.  Using case-based reasoning as a reinforcement learning framework for optimisation with changing criteria , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[17]  E. Nowicki,et al.  A Fast Taboo Search Algorithm for the Job Shop Problem , 1996 .

[18]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[19]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[20]  David Joslin,et al.  "Squeaky Wheel" Optimization , 1998, AAAI/IAAI.

[21]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[22]  Martin A. Riedmiller,et al.  A Neural Reinforcement Learning Approach to Learn Local Dispatching Policies in Production Scheduling , 1999, IJCAI.

[23]  Gang Wang,et al.  Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.

[24]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[25]  Dimitri P. Bertsekas,et al.  Missile defense and interceptor allocation by neuro-dynamic programming , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[26]  Gavriel Salvendy,et al.  Handbook of industrial engineering , 2001 .

[27]  Dario Pacciarelli,et al.  Job-shop scheduling with blocking and no-wait constraints , 2002, Eur. J. Oper. Res..

[28]  S. Binato,et al.  A GRASP FOR JOB SHOP SCHEDULING , 2001 .

[29]  C. Ribeiro,et al.  Essays and Surveys in Metaheuristics , 2002, Operations Research/Computer Science Interfaces Series.

[30]  Rémi Munos,et al.  Error Bounds for Approximate Policy Iteration , 2003, ICML.

[31]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[32]  Beatrice M. Ombuki-Berman,et al.  Local Search Genetic Algorithms for the Job Shop Scheduling Problem , 2004, Applied Intelligence.

[33]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[34]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Martin A. Riedmiller,et al.  Reducing policy degradation in neuro-dynamic programming , 2006, ESANN.

[37]  Abhijit Gosavi,et al.  Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .

[38]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.