COMPUTING AVERAGE OPTIMAL CONSTRAINED POLICIES IN STOCHASTIC DYNAMIC PROGRAMMING

A stochastic dynamic program incurs two types of cost: a service cost and a quality of service (delay) cost. The objective is to minimize the expected average service cost, subject to a constraint on the average quality of service cost. When the state space S is finite, we show how to compute an optimal policy for the general constrained problem under weak conditions. The development uses a Lagrange multiplier approach and value iteration. When S is denumerably infinite, we give a method for computation of an optimal policy, using a sequence of approximating finite state problems. The method is illustrated with two computational examples.

[1]  Armand M. Makowski,et al.  On constrained optimization of the Klimov network and related Markov decision processes , 1993, IEEE Trans. Autom. Control..

[2]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[3]  Armand M. Makowski,et al.  Implementation Issues for Markov Decision Processes , 1988 .

[4]  Eitan Altman,et al.  Sensitivity of constrained Markov decision processes , 1991, Ann. Oper. Res..

[5]  J. Huisman The Netherlands , 1996, The Lancet.

[6]  A. Piunovskiy Optimal Control of Random Sequences in Problems with Constraints , 1997 .

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[9]  Linn I. Sennott,et al.  Constrained Average Cost Markov Decision Chains , 1993, Probability in the Engineering and Informational Sciences.

[10]  Armand M. Makowski,et al.  An Optimal Adaptive Scheme for Two Competing Queues with Constraints , 1986 .

[11]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[12]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[13]  Adam Shwartz,et al.  Optimal priority assignment: a time sharing approach , 1989 .

[14]  E. Altman Constrained Markov Decision Processes , 1999 .

[15]  Armand M. Makowski,et al.  A class of steering policies under a recurrence condition , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[16]  A. Shwartz,et al.  Stochastic approximations for finite-state Markov chains , 1990 .

[17]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[18]  Eitan Altman,et al.  Time-Sharing Policies for Controlled Markov Chains , 1993, Oper. Res..

[19]  L. Sennott Stochastic Dynamic Programming and the Control of Queueing Systems , 1998 .