MARKOV-RENEWAL PROGRAMMING

Abstract : A special structure in dynamic programming is the problem of programming over a Markov chain. This paper extends the solution algorithms to programming over a Markov - renewal process - in which times between transitions of the system from state i to state j are independent samples from an inter-transition distribution which may depend on both i and j. For these processes, a general reward structure and a decision mechanism are postulated; the problem is to make decisions at each transition to maximize the total expected reward at the end of the planning horizon. For finite-horizon problems, or infinite-horizon problems with discounting, there is no difficulty; the results are similar to previous work, expect for a new dependency upon the transition time distributions being generally present. In the cases where the horizon extends towards infinity, or when discounting vanishes, however, a fundamental dichotomy in the optimal solutions may occur. It then becomes important to specify whether the limiting experiment is: (i) undiscounted, with the number of transitions n approaches infinity , (ii) undiscounted, with a time horizon t approaches infinity , or (iii) infinite n or t , with discount factor a approaches 0 . In each case, a limiting form for the total expected reward is shown, and an algorithm developed to maximize the rate of return. The problem of finding the optimal or near-optimal policies In the case of ties in rate of return is still computationally unresolved. Extensions to non-ergodic processes are indicated, and special results for the two-state process are presented. Finally, an example of machine maintenance and repair is used to illustrate the generality of the approach and the special problems which may arise.