Constrained Average Cost Markov Control Processes in Borel Spaces

This paper considers constrained Markov control processes in Borel spaces, with unbounded costs. The criterion to be minimized is a long-run expected average cost, and the constraints can be imposed on similar average costs, or on average rewards, or discounted costs or rewards. We give conditions under which the constrained problem (CP) is solvable and equivalent to an equality constrained (EC) linear program. Furthermore, we show that there is no duality gap between EC and the dual program EC* and that in fact the strong duality condition holds. Finally, we introduce an explicit procedure to solve CP in some cases which is illustrated with a detailed example.

[1]  A. Hordijk,et al.  Constrained admission control to a queueing system , 1989, Advances in Applied Probability.

[2]  E. Altman Constrained Markov Decision Processes , 1999 .

[3]  Eugene A. Feinberg,et al.  Constrained dynamic programming with two discount factors: applications and an algorithm , 1999, IEEE Trans. Autom. Control..

[4]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[5]  Onésimo Hernández-Lerma,et al.  A multiobjective control approach to priority queues , 2001, Math. Methods Oper. Res..

[6]  Onésimo Hernández-Lerma,et al.  Approximation Schemes for Infinite Linear Programs , 1998, SIAM J. Optim..

[7]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[8]  D. Blackwell Memoryless Strategies in Finite-Stage Dynamic Programming , 1964 .

[9]  M. Kurano The existence of minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin , 1989 .

[10]  P. Billingsley,et al.  Convergence of Probability Measures , 1969 .

[11]  Onésimo Hernández-Lerma,et al.  Constrained Markov control processes in Borel spaces: the discounted case , 2000, Math. Methods Oper. Res..

[12]  Arie Hordijk,et al.  Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards , 1999, Math. Methods Oper. Res..

[13]  O. Hernández-Lerma,et al.  Average cost Markov control processes with weighted norms: value iteration , 1994 .

[14]  Eugene A. Feinberg,et al.  Constrained Discounted Dynamic Programming , 1996, Math. Oper. Res..

[15]  Linn I. Sennott,et al.  Constrained Average Cost Markov Decision Chains , 1993, Probability in the Engineering and Informational Sciences.

[16]  E. Anderson Linear Programming In Infinite Dimensional Spaces , 1970 .

[17]  O. Hernández-Lerma,et al.  Infinite Linear Programming and Multichain Markov Control Processes in Uncountable Spaces , 1998 .

[18]  B. Craven,et al.  Generalizations of Farkas’ Theorem , 1977 .

[19]  F. Vakil,et al.  Flow control protocols for integrated networks with partially observed voice traffic , 1987 .

[20]  Masami Kurano,et al.  Constrained markov decision processes with compact state and action spaces: the average case , 2000 .

[21]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[22]  Aurel A. Lazar,et al.  Optimal flow control of a class of queueing networks in equilibrium , 1983 .

[23]  Keith W. Ross,et al.  Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..

[24]  Onésimo Hernández-Lerma,et al.  Limiting Discounted-Cost Control of Partially Observable Stochastic Systems , 2001, SIAM J. Control. Optim..

[25]  O. Hernández-Lerma,et al.  Linear Programming Approximations for Markov Control Processes in Metric Spaces , 1997 .

[26]  V. Borkar Ergodic Control of Markov Chains with Constraints---The General Case , 1994 .

[27]  Onésimo Hernández-Lerma,et al.  Minimax Control of Discrete-Time Stochastic Systems , 2002, SIAM J. Control. Optim..

[28]  O. Hernández-Lerma,et al.  Further topics on discrete-time Markov control processes , 1999 .

[29]  Onésimo Hernández-Lerma,et al.  Average cost Markov control processes with weighted norms: existence of canonical policies , 1995 .

[30]  Kamal Golabi,et al.  A Statewide Pavement Management System , 1982 .

[31]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[32]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[33]  A. Piunovskiy Optimal Control of Random Sequences in Problems with Constraints , 1997 .