We consider closed-loop solutions to stochastic optimization problems of resource allocation type. They concern with the dynamic allocation of reusable resources over time to non-preemtive interconnected tasks with stochastic durations. The aim is to minimize the expected value of a regular performance measure. First, we formulate the problem as a stochastic shortest path problem and argue that our formulation has favorable properties, e.g., it has finite horizon, it is acyclic, thus, all policies are proper, and moreover, the space of control policies can be safely restricted. Then, we propose an iterative solution. Essentially, we apply a reinforcement learning based adaptive sampler to compute a sub-optimal control policy. We suggest several approaches to enhance this solution and make it applicable to large-scale problems. The main improvements are: (1) the value function is maintained by feature-based support vector regression; (2) the initial exploration is guided by rollout algorithms; (3) the state space is partitioned by clustering the tasks while keeping the precedence constraints satisfied; (4) the action space is decomposed and, consequently, the number of available actions in a state is decreased; and, finally, (5) we argue that the sampling can be effectively distributed among several processors. The effectiveness of the approach is demonstrated by experimental results on both artificial (benchmark) and real-world (industry related) data.
[1]
Wei Zhang,et al.
A Reinforcement Learning Approach to job-shop Scheduling
,
1995,
IJCAI.
[2]
Andrew W. Moore,et al.
Value Function Based Production Scheduling
,
1998,
ICML.
[3]
Han Hoogeveen,et al.
Short Shop Schedules
,
1997,
Oper. Res..
[4]
Warren B. Powell,et al.
Handbook of Learning and Approximate Dynamic Programming
,
2006,
IEEE Transactions on Automatic Control.
[5]
Johann L. Hurink,et al.
Tabu search for the job-shop scheduling problem with multi-purpose machines
,
1994
.
[6]
Barbara Hammer,et al.
Improving iterative repair strategies for scheduling with the SVM
,
2003,
ESANN.
[7]
Dimitri P. Bertsekas,et al.
Dynamic Programming and Optimal Control, Two Volume Set
,
1995
.
[8]
Luca Maria Gambardella,et al.
Effective Neighborhood Functions for the Flexible Job Shop Problem
,
1998
.
[9]
Michael Pinedo,et al.
Scheduling: Theory, Algorithms, and Systems
,
1994
.
[10]
Bernhard Schölkopf,et al.
New Support Vector Algorithms
,
2000,
Neural Computation.
[11]
Xin Wang,et al.
Batch Value Function Approximation via Support Vectors
,
2001,
NIPS.