Constrained Undiscounted Stochastic Dynamic Programming

In this paper we investigate the computation of optimal policies in constrained discrete stochastic dynamic programming with the average reward as utility function. The state-space and action-sets are assumed to be finite. Constraints which are linear functions of the state-action frequencies are allowed. In the general multichain case, an optimal policy will be a randomized nonstationary policy. An algorithm to compute such an optimal policy is presented. Furthermore, sufficient conditions for optimal policies to be stationary are derived. There are many applications for constrained undiscounted stochastic dynamic programming, e.g., in multiple objective Markovian decision models.

[1]  J. Doob Stochastic processes , 1953 .

[2]  Samuel Karlin,et al.  Mathematical Methods and Theory in Games, Programming, and Economics , 1961 .

[3]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[4]  D. Blackwell Discrete Dynamic Programming , 1962 .

[5]  C. Derman Stable sequential control rules and Markov chains , 1963 .

[6]  C. Derman,et al.  Some Remarks on Finite Horizon Markovian Decision Models , 1965 .

[7]  C. Derman,et al.  A Note on Memoryless Rules for Controlling Sequential Control Processes , 1966 .

[8]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[9]  Bennett L. Fox,et al.  Scientific Applications: An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix , 1967, Commun. ACM.

[10]  E. Denardo,et al.  Multichain Markov Renewal Programs , 1968 .

[11]  P. Schweitzer Perturbation theory and finite Markov chains , 1968 .

[12]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[13]  Arie Hordijk,et al.  A sufficient condition for the existence of an optimal policy with respect to the average cost criterion in markovian decision processes : Prepublication , 1971 .

[14]  C. Derman,et al.  Constrained Markov Decision Chains , 1972 .

[15]  Arie Hordijk,et al.  Dynamic programming and Markov potential theory , 1974 .

[16]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[17]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[18]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[19]  Arie Hordijk,et al.  Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints , 1984, Math. Program..