论文信息 - Linear programming formulation of MDPs in countable state space: The multichain case

Linear programming formulation of MDPs in countable state space: The multichain case

We present an Linear Programming formulation of MDPs with countable state and action spaces and no unichain assumption. This is an extension of the Hordijk and Kallenberg (1979) formulation in finite state and action spaces. We provide sufficient conditions for both existence of optimal solutions to the primal LP program and absence of duality gap. Then, existence of a (possibly randomized) average optimal policy is also guaranteed. Existence of a stationary average optimal deterministic policy is also investigated.

Jean B. Lasserre | Arie Hordijk | A. Hordijk | J. Lasserre

[1] V. Borkar. A convex analytic approach to Markov decision processes , 1988 .

[2] E. Denardo,et al. Multichain Markov Renewal Programs , 1968 .

[3] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[4] J. Lasserre. Average Optimal Stationary Policies and Linear Programming in Countable Space Markov Decision Processes , 1994 .

[5] M. Kurano. The existence of minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin , 1989 .

[6] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .

[7] Keigo Yamada. Duality theorem in Markovian decision problems , 1975 .

[8] A. Hordijk,et al. Linear Programming and Markov Decision Chains , 1979 .

[9] L. C. M. Kallenberg,et al. Linear programming and finite Markovian control problems , 1984 .

[10] E. Denardo. On Linear Programming in a Markov Decision Problem , 1970 .

[11] E. Altman,et al. Markov decision problems and state-action frequencies , 1991 .

[12] E. Anderson. Linear Programming In Infinite Dimensional Spaces , 1970 .

[13] O. Hernández-Lerma,et al. Linear Programming and Average Optimality of Markov Control Processes on Borel Spaces---Unbounded Costs , 1994 .