A MODIFIED FORM OF THE ITERATIVE METHOD OF DYNAMIC PROGRAMMING

This paper considers the discrete time finite state Markovian decision problem with the average return criterion. A modified form of the itera tive method of dynamic programming is studied. Under the assumption that the maximal average return is independent of the initial state the as ymptotic behaviour of the sequence of functions generated by this modified method is found. It is shown that the modified iterative method supplies both upper and lower bounds on the maximal average return and .-optimal policies. Moreover, a convergence result is proved for the policies produced by the modified iterative method.

[1]  Arie Hordijk,et al.  On the convergence of the average expected return in dynamic programming , 1974 .

[2]  J. Bather Optimal decision procedures for finite Markov chains. Part II: Communicating systems , 1973, Advances in Applied Probability.

[3]  J. Bather Optimal decision procedures for finite markov chains. Part I: Examples , 1973, Advances in Applied Probability.

[4]  A. Hordijk,et al.  The asymptotic behaviour of the minimal total expected cost in denumerable state dynamic programming and an application in inventory theory , 1973 .

[5]  E. Denardo A Markov Decision Problem , 1973 .

[6]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[7]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[8]  B. L. Miller,et al.  Discrete Dynamic Programming with a Small Interest Rate , 1969 .

[9]  E. Lanery,et al.  Étude asymptotique des systèmes markoviens à commande , 1967 .

[10]  Rutherford Aris,et al.  Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[11]  W. Barry On the Iterative Method of Dynamic Programming on a Finite Space Discrete Time Markov Process , 1965 .

[12]  D. Blackwell Discounted Dynamic Programming , 1965 .

[13]  P. Schweitzer Perturbation theory and Markovian decision processes. , 1965 .

[14]  D. White,et al.  Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[15]  R. Howard Dynamic Programming and Markov Processes , 1960 .