Markov-Renewal Programming. II: Infinite Return Models, Example

This paper is a continuation of a previous one which investigates programming over a Markov-renewal process---in which the intervals between transitions of a system from state i to state j are independent samples from a distribution that may depend upon both i and j. Given a reward structure, and a decision mechanism that influences both the rewards and the Markov-renewal process, the problem is to select alternatives at each transition so as to maximize total expected reward. The first portion of the paper investigated various finite-return models. In this part of the paper, we investigate the infinite-return models, where it becomes necessary to consider only stationary policies that maximize the dominant term in the reward. It is then important to specify whether the limiting experiment is I undiscounted, with the number of transitions n → ∞, II undiscounted, with a time horizon t → ∞, or III discounted, infinite n or t, with discount factor a → 0. In each case, a limiting form for the total expected reward is shown, and an algorithm developed to maximize the rate of return. The problem of finding the optimal or near-optimal policies in the case of ties is still computationally unresolved. Extensions to nonergodic processes are indicated, and special results for the two-state process are presented. Finally, an example of machine maintenance and repair is used to illustrate the generality of the models and the special problems that may arise.