On the Optimality of Structured Policies in Countable Stage Decision Processes
暂无分享,去创建一个
Multi-stage decision processes are considered, in notation which is an outgrowth of that introduced by Denardo [Denardo, E. 1967. Contraction mappings in the theory underlying dynamic programming. SIAM Rev. 9 165-177.]. Certain Markov decision processes, stochastic games, and risk-sensitive Markov decision processes can be formulated in this notation. We identify conditions sufficient to prove that, in infinite horizon nonstationary processes, the optimal infinite horizon (present) value exists, is uniquely defined, is what is called "structured," and can be found by solving Bellman's optimality equations: \epsilon -optimal strategies exist: an optimal strategy can be found by applying Bellman's optimality criterion; and a specially identified kind of policy, called a "structured" policy is optimal in each stage. A link is thus drawn between (i) studies such as those of Blackwell [Blackwell, D. 1965. Discounted dynamic programming. Ann. Math. Stat. 36 226-235.] and Strauch [Strauch, R. 1966. Negative dynamic programming. Ann. Math. Stat. 37 871-890.], where general policies for general processes are considered, and (ii) other studies, such as those of Scarf [Scarf, H. 1963. The optimality of (S, s) policies in the dynamic inventory problem. H. Scarf, D. Gilford, M. Shelly, eds. Mathematical Methods in the Social Sciences . Stanford University Press, Stanford.] and Derman [Derman, C. 1963. On optimal replacement rules when changes of state are Markovian. R. Bellman, ed. Mathematical Optimization Techniques. University of California Press. Berkeley.] where structured policies for special processes are considered. Those familiar with dynamic programming models (e.g., inventory, queueing optimization, replacement, optimal stopping) will be well acquainted with the use of what we call structured policies and value functions. The infinite stage results are built on finite stage results. Results for the stationary infinite horizon case are also included. For an application, we provide conditions sufficient to prove that an optimal stationary strategy exists in a discounted stationary risk sensitive Markov decision process with constant risk aversion. In Porteus [Porteus, E. On the optimality of structured policies in countable stage decision processes. Research Paper No. 141, Graduate School of Business, Stanford University, 71 pp., 1973, 1974, unabridged version of present paper.], of which this is a condensation, we also (i) show how known conditions under which a Borel measurable policy is optimal in an infinite horizon, nonstationary Markov decision process, fit into our framework, and (ii) provide conditions under which a generalized (s, S) policy [Porteus, E. 1971. On the optimality of generalized (s, S) policies. Management Sci. 17 411-426.] is optimal in an infinite horizon nonstationary inventory process.