Constrained Semi-Markov decision processes with average rewards

This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the firstn steps to the expected total duration of thesen steps asn → ∞. For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies.

[1]  Eitan Altman,et al.  Denumerable Constrained Markov Decision Problems and Finite Approximations Denumerable Constrained Markov Decision Problems and Finite Approximations , 1992 .

[2]  B. Fox (g, w)—Optima in Markov Renewal Programs , 1968 .

[3]  Eitan Altman,et al.  Sensitivity of constrained Markov decision processes , 1991, Ann. Oper. Res..

[4]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[5]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[6]  Arie Hordijk,et al.  Dynamic programming and Markov potential theory , 1974 .

[7]  A. Shwartz,et al.  Adaptive control of constrained Markov chains , 1991 .

[8]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[9]  W RossKeith,et al.  Markov Decision Processes with Sample Path Constraints , 1989 .

[10]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[11]  E. Denardo,et al.  Multichain Markov Renewal Programs , 1968 .

[12]  W RossKeith,et al.  Multichain Markov Decision Processes with a Sample Path Constraint , 1991 .

[13]  Eitan Altman,et al.  Denumerable Constrained Markov Decision Processes and Finite Approximations , 1994, Math. Oper. Res..

[14]  V. Borkar Topics in controlled Markov chains , 1991 .

[15]  Arie Hordijk,et al.  Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[16]  Keith W. Ross,et al.  Markov Decision Processes with Sample Path Constraints: The Communicating Case , 1989, Oper. Res..

[17]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[18]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[19]  Keith W. Ross,et al.  Optimal priority assignment with hard constraint , 1986 .

[20]  Linn I. Sennott,et al.  Constrained Average Cost Markov Decision Chains , 1993, Probability in the Engineering and Informational Sciences.

[21]  Abraham Charnes,et al.  Programming with linear fractional functionals , 1962 .

[22]  C. Derman,et al.  Constrained Markov Decision Chains , 1972 .

[23]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[24]  J. Neveu,et al.  Mathematical foundations of the calculus of probability , 1965 .

[25]  Eitan Altman,et al.  Asymptotic properties of constrained Markov Decision Processes , 1993, ZOR Methods Model. Oper. Res..

[26]  Manfred Schäl,et al.  On the Second Optimality Equation for Semi-Markov Decision Models , 1992, Math. Oper. Res..

[27]  A. A. Yushkevich,et al.  On Semi-Markov Controlled Models with an Average Reward Criterion , 1982 .

[28]  C. Derman On Sequential Decisions and Markov Chains , 1962 .

[29]  Keith W. Ross,et al.  Uniformization for semi-Markov decision processes under stationary policies , 1987, Journal of Applied Probability.

[30]  Armand M. Makowski,et al.  On constrained optimization of the Klimov network and related Markov decision processes , 1993, IEEE Trans. Autom. Control..

[31]  J. A. Bather Markovian Decision Processes , 1971 .

[32]  F. Beutler,et al.  Time-average optimal constrained semi-Markov decision processes , 1986, Advances in Applied Probability.

[33]  E. Fainberg Controlled Markov Processes with Arbitrary Numerical Criteria , 1983 .

[34]  Keith W. Ross,et al.  Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..

[35]  E. Altman,et al.  Markov decision problems and state-action frequencies , 1991 .