Universally Measurable Policies in Dynamic Programming

Powerful dynamic programming results concerning existence and characterizations of optimal or nearly optimal policies, convergence of algorithms and characterizations of the optimal cost function have been available for some time see Bellman [Bellman, R. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ.], but a rigorous proof of these results has required quite restrictive hypotheses, such as countability of the state space, in order to circumvent the inherent measurability difficulties. Some of these results or weaker versions of them have been proved by Blackwell Blackwell, D. 1965. Positive dynamic programming. Proc. Fifth Berkeley Symposium on Math. Stat. and Prob. 415--418; Blackwell, D. 1965. Discounted dynamic programming. Ann. Math. Stat.36 226--235., Strauch Strauch, R. E. 1966. Negative dynamic programming. Ann. Math. Stat.37 871--890., Hinderer Hinderer, K. 1970. Foundations of Nonstationary Dynamic Programming with Discrete Time Parameter. Springer, New York., Dynkin and Juskevic Dynkin, E. B., A. A. Juskevic. 1975. Controlled Markov Processes and Their Applications. Moscow. English translation to be published by Springer. and others in the framework of Borel spaces and Borel measurable policies. We show that the use of universally measurable policies in the Borel space framework resolves the measurability issues so that all the basic results of dynamic programming can be obtained in the strongest possible form. In particular, e-optimal policies are shown to exist, the dynamic programming algorithm is defined and conditions and bounds for its convergence to the optimal cost are given. The optimality equation is shown to hold and is used to characterize the optimal cost function and optimal policies.

[1]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[2]  P. WHITTLE,et al.  Markov Processes and Their Applications , 1962, Nature.

[3]  D. Blackwell Discounted Dynamic Programming , 1965 .

[4]  R. Bellman Dynamic programming. , 1957, Science.

[5]  R. Strauch Negative Dynamic Programming , 1966 .

[6]  K. Parthasarathy,et al.  Probability measures on metric spaces , 1967 .

[7]  David Blackwell,et al.  Positive dynamic programming , 1967 .

[8]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[9]  A. Zvonkin ON SEQUENTIALLY CONTROLLED MARKOV PROCESSES , 1971 .

[10]  N. Furukawa Markovian Decision Processes with Compact Action Spaces , 1972 .

[11]  M. Schäl On continuous dynamic programming with discrete time-parameter , 1972 .

[12]  P. Meyer,et al.  Reduites et jeux de hasard , 1973 .

[13]  L. Brown,et al.  Measurable Selections of Extrema , 1973 .

[14]  D. Blackwell,et al.  The Optimal Reward Operator in Dynamic Programming , 1974 .

[15]  D. Freedman The Optimal Reward Operator in Special Classes of Dynamic Programming Problems , 1974 .

[16]  M. Schäl A selection theorem for optimization problems , 1974 .

[17]  M. Schäl On dynamic programming: Compactness of the space of policies , 1975 .

[18]  Charlotte Striebel,et al.  Optimal Control of Discrete Time Stochastic Systems , 1975 .

[19]  Manfred SchÄl,et al.  Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal , 1975 .

[20]  T. Parthasarathy,et al.  Optimal Plans for Dynamic Programming Problems , 1976, Math. Oper. Res..

[21]  L. J. Savage,et al.  Inequalities for Stochastic Processes: How to Gamble If You Must , 1976 .

[22]  D. Bertsekas Monotone Mappings with Application in Dynamic Programming , 1977 .

[23]  D. Bertsekas,et al.  Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[24]  A. Barbarosie On the Theory of Controlled Markov Processes , 1977 .

[25]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.