Production scheduling, the problem of sequentially con guring a factory to meet forecasted demands, is a critical problem throughout the manufacturing industry. The requirement of maintaining product inventories in the face of unpredictable demand and stochastic factory output makes standard scheduling models, such as job-shop, inadequate. Currently applied algorithms, such as simulated annealing and constraint propagation, must employ ad-hoc methods such as frequent replanning to cope with uncertainty. In this paper, we describe a Markov Decision Process (MDP) formulation of production scheduling which captures stochasticity in both production and demands. The solution to this MDP is a value function which can be used to generate optimal scheduling decisions online. A simple example illustrates the theoretical superiority of this approach over replanning-based methods. We then describe an industrial application and two reinforcement learning methods for generating an approximate value function on this domain. Our results demonstrate that in both deterministic and noisy scenarios, value function approximation is an e ective technique.
[1]
W. Cleveland,et al.
Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting
,
1988
.
[2]
Emilian Stephen Ochotta.
Synthesis of high-performance analog cells in ASTRX/OBLX
,
1994
.
[3]
Wei Zhang,et al.
A Reinforcement Learning Approach to job-shop Scheduling
,
1995,
IJCAI.
[4]
Gerald Tesauro,et al.
On-line Policy Improvement using Monte-Carlo Search
,
1996,
NIPS.
[5]
Andrew W. Moore,et al.
Learning Evaluation Functions for Large Acyclic Domains
,
1996,
ICML.
[6]
Andrew W. Moore,et al.
Efficient Locally Weighted Polynomial Regression Predictions
,
1997,
ICML.
[7]
Sridhar Mahadevan,et al.
Optimizing Production Manufacturing Using Reinforcement Learning
,
1998,
FLAIRS.
[8]
Mark S. Fox,et al.
Intelligent Scheduling
,
1998
.