This paper describes the application of reinforcement learning (RL) to the difficult real world problem of elevator dispatching. The elevator domain poses a combination of challenges not seen in most RL research to date. Elevator systems operate in continuous state spaces and in continuous time as discrete event dynamic systems. Their states are not fully observable and they are nonstationary due to changing passenger arrival rates. In addition, we use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reinforcement signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of RL on a very large scale stochastic dynamic optimization problem of practical utility.
[1]
Michael O. Duff,et al.
Reinforcement Learning Methods for Continuous-Time Markov Decision Problems
,
1994,
NIPS.
[2]
Ben J. A. Kröse,et al.
Learning from delayed rewards
,
1995,
Robotics Auton. Syst..
[3]
TesauroGerald.
Practical Issues in Temporal Difference Learning
,
1992
.
[4]
Gerald Tesauro,et al.
Practical Issues in Temporal Difference Learning
,
1992,
Mach. Learn..
[5]
Gerald Tesauro,et al.
Temporal difference learning and TD-Gammon
,
1995,
CACM.
[6]
James Alan Lewis,et al.
A dynamic load balancing approach to the control of multi-server polling systems with applications to elevator system dispatching
,
1992
.
[7]
Gerald Tesauro,et al.
Temporal Difference Learning and TD-Gammon
,
1995,
J. Int. Comput. Games Assoc..
[8]
Gerald Tesauro,et al.
TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play
,
1994,
Neural Computation.