论文信息 - COMPUTING AVERAGE OPTIMAL CONSTRAINED POLICIES IN STOCHASTIC DYNAMIC PROGRAMMING

COMPUTING AVERAGE OPTIMAL CONSTRAINED POLICIES IN STOCHASTIC DYNAMIC PROGRAMMING

A stochastic dynamic program incurs two types of cost: a service cost and a quality of service (delay) cost. The objective is to minimize the expected average service cost, subject to a constraint on the average quality of service cost. When the state space S is finite, we show how to compute an optimal policy for the general constrained problem under weak conditions. The development uses a Lagrange multiplier approach and value iteration. When S is denumerably infinite, we give a method for computation of an optimal policy, using a sequence of approximating finite state problems. The method is illustrated with two computational examples.

Linn I. Sennott | L. Sennott

[1] Armand M. Makowski,et al. On constrained optimization of the Klimov network and related Markov decision processes , 1993, IEEE Trans. Autom. Control..

[2] Keith W. Ross,et al. Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[3] Armand M. Makowski,et al. Implementation Issues for Markov Decision Processes , 1988 .

[4] Eitan Altman,et al. Sensitivity of constrained Markov decision processes , 1991, Ann. Oper. Res..

[5] J. Huisman. The Netherlands , 1996, The Lancet.

[6] A. Piunovskiy. Optimal Control of Random Sequences in Problems with Constraints , 1997 .

[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[9] Linn I. Sennott,et al. Constrained Average Cost Markov Decision Chains , 1993, Probability in the Engineering and Informational Sciences.

[10] Armand M. Makowski,et al. An Optimal Adaptive Scheme for Two Competing Queues with Constraints , 1986 .

[11] F. Beutler,et al. Optimal policies for controlled markov chains with a constraint , 1985 .