On continuous dynamic programming with discrete time-parameter

A rigorous foundation of stochastic dynamic programming was given by Blackwell [2] and Strauch [11], who treated stationary models. The decision model which is taken as a basis of the present work is a slight generalization of the model of Blackwell and Strauch allowing the discount factor to depend on the state of the system and the selected action. Thus we include models arising from Markov renewal processes or semi-Markoff processes, respectively, as well as from stopping and search problems. This model is a special case of a nonstationary decision model as defined by Hinderer [5], but preserves the stationary structure of the model of Blackwell and Strauch. Thus, on the one hand a series of results obtained by Hinderer [5] and [6], e.g. the universal measurability of the optimal return and the validity of the optimality equation, apply to our model. On the other hand, results of Blackwell and Strauch ([2, 3, 11]) concerning the stationary character, e. g. the optimality of stationary plans, can be generalized to our model by using many of their ideas. In [9] it was investigated to what extent it is justified to confine ourselves to stationary plans. The main purpose of the present paper is to give sufficient conditions for the existence of optimal and e-optimal plans. We assume the reward, the discount factor, and the transition law to depend continuously on the actions. Then, under certain convergence conditions on the expected total return under admissible plans, there exists a stationary e-optimal plan and, if moreover the sets of admissible actions are compact, there exists a stationary optimal plan. Similar results were obtained by Maitra ([7, 8]) who on the one hand presumes a weaker form of continuity (more precisely the upper semi-continuity) but on the other hand assumes the reward and the transition law to depend continuously on both the states and the actions. As to the convergence conditions imposed on the expected return, we will essentially treat the negative bounded case (in the terminology of Strauch). However the assumption that the reward is negative will be generalized to a large extend, thus including the so-called discounted case. These conditions were found by Hinderer in [6] for the more general non-stationary model and were adjusted to the model of the present work.