论文信息 - A Reinforcement Learning Approach to job-shop Scheduling

A Reinforcement Learning Approach to job-shop Scheduling

We apply reinforcement learning methods to learn domain-specific heuristics for job shop scheduling. A repair-based scheduler starts with a critical-path schedule and incrementally repairs constraint violations with the goal of finding a short conflict-free schedule. The temporal difference algorithm TD(λ) is applied to tram a neural network to learn a heuristic evaluation function over states. This evaluation function is used by a one-step lookahead search procedure to find good solutions to new scheduling problems. We evaluate this approach on synthetic problems and on problems from a NASA space shuttle pay load processing task. The evaluation function is trained on problems involving a small number of jobs and then tested on larger problems. The TD scheduler performs better than the best known existing algorithm for this task--Zwehen's iterative repair method based on simulated annealing. The results suggest that reinforcement learning can provide a new method for constructing high-performance scheduling systems.

Wei Zhang | Thomas G. Dietterich | Wei Zhang

[1] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[2] D. Pomerleau. Eecient T Raining of Artiicial Neural Networks for Autonomous Navigation , 1991 .

[3] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[4] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[5] Monte Zweben,et al. Scheduling and rescheduling with iterative repair , 1993, IEEE Trans. Syst. Man Cybern..

[6] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.