This paper formulates the optimal control problem for a class of mathematical models in which the system to be controlled is characterized by a finite-state discrete-time Markov process. The states of this internal process are not directly observable by the controller; rather, he has available a set of observable outputs that are only probabilistically related to the internal state of the system. The formulation is illustrated by a simple machine-maintenance example, and other specific application areas are also discussed. The paper demonstrates that, if there are only a finite number of control intervals remaining, then the optimal payoff function is a piecewise-linear, convex function of the current state probabilities of the internal Markov process. In addition, an algorithm for utilizing this property to calculate the optimal control policy and payoff function for any finite horizon is outlined. These results are illustrated by a numerical example for the machine-maintenance problem.
[1]
Ronald A. Howard,et al.
Dynamic Programming and Markov Processes
,
1960
.
[2]
Alvin W Drake,et al.
Observation of a Markov process through a noisy channel
,
1962
.
[3]
Richard D. Smallwood,et al.
QUANTITATIVE METHODS IN COMPUTER-DIRECTED TEACHING SYSTEMS.
,
1967
.
[4]
James E. Eckles,et al.
Optimum Maintenance with Incomplete Information
,
1968,
Oper. Res..
[5]
Richard D. Smallwood,et al.
Optimum Policy Regions for Computer-Directed Teaching Systems.
,
1968
.
[6]
Stephen M. Pollock,et al.
A Simple Model of Search for a Moving Target
,
1970,
Oper. Res..
[7]
Edward J. Sondik,et al.
The optimal control of par-tially observable Markov processes
,
1971
.
[8]
Richard D. Smallwood,et al.
The analysis of economic teaching strategies for a simple learning model
,
1971
.