论文信息 - The Shift-Function Approach for Markov Decision Processes with Unbounded Returns.

The Shift-Function Approach for Markov Decision Processes with Unbounded Returns.

Abstract : We study a discrete-time Markov decision process with general state and action space. The objective is to maximize the expected total return over a finite or infinite horizon. The transition probability measure is allowed to be defective, so that the model includes discounting, state-and action-dependent transition times (semi-Markov decision processes), and stopping problems. With applications to control of queues and inventory systems as a motivation, we develop a set of conditions on the one-period return function, the transition probabilities and the terminal value function that guarantee uniform convergence (with respect to the sup norm) of the finite-horizon optimal value functions to the infinite-horizon optimal value function (successive approximations). These conditions are substantially weaker and more realistic for the applications we have in mind than those of the classical, discounted bounded model. (Author)

Jo van Nunen | Shaler Stidham | S. Stidham | J. Nunen

[1] J. Wessels. Markov programming by successive approximations by respect to weighted supremum norms , 1976, Advances in Applied Probability.

[2] J. Harrison. Discrete Dynamic Programming with Unbounded Rewards , 1972 .

[3] J. MacQueen. A MODIFIED DYNAMIC PROGRAMMING METHOD FOR MARKOVIAN DECISION PROBLEMS , 1966 .

[4] R. Strauch. Negative Dynamic Programming , 1966 .

[5] E. Lehmann. Ordered Families of Distributions , 1955 .

[6] D. Blackwell. Discounted Dynamic Programming , 1965 .

[7] Shaler Stidham,et al. Individual versus Social Optimization in Exponential Congestion Systems , 1977, Oper. Res..

[8] Evan L. Porteus. Bounds and Transformations for Discounted Finite Markov Decision Chains , 1975, Oper. Res..

[9] Michael J. Magazine,et al. A Classified Bibliography of Research on Optimal Design and Control of Queues , 1977, Oper. Res..