论文信息 - Time-Sharing Policies for Controlled Markov Chains

Time-Sharing Policies for Controlled Markov Chains

We propose a class of nonstationary policies called policy time sharing PTS, which possesses several desirable properties for problems where the criteria are of the average-cost type; an optimal policy exists within this class, the computation of optimal policies is straightforward, and the implementation of this policy is easy. While in the finite state case stationary policies are also known to share these properties, the new policies are much more flexible, in the sense that they can be applied to solve adaptive problems, and they suggest new ways to incorporate the particular structure of the problem at-hand into the derivation of optimal policies. In addition, they provide insight into the pathwise-structure of controlled Markov chains. To use PTS policies one alternates between the use of several stationary deterministic policies, switching when reaching some predetermined state. In some countable state cases optimal solutions of the PTS type are available and easy to compute, whereas optimal stationary policies are not available. Examples that illustrate the last point and the usefulness of the new approach are discussed, involving constrained optimization problems with countable state space or compact action space.

Eitan Altman | Adam Shwartz | E. Altman | A. Shwartz

[1] Manfred SchÄl,et al. Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal , 1975 .

[2] B. Fox,et al. Adaptive Policies for Markov Renewal Programs , 1973 .

[3] Arie Hordijk,et al. Constrained Undiscounted Stochastic Dynamic Programming , 1984, Math. Oper. Res..

[4] J. Kingman. A FIRST COURSE IN STOCHASTIC PROCESSES , 1967 .

[5] E. Altman,et al. Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[6] F. Beutler,et al. Optimal policies for controlled markov chains with a constraint , 1985 .

[7] Manfred Schäl,et al. ASYMPTOTIC RESULTS FOR SEQUENTIAL MARKOV DECISION MODELS UNDER UNCERTAINTY , 1984 .

[8] E. Altman,et al. Markov decision problems and state-action frequencies , 1991 .

[9] Kai Lai Chung,et al. Markov Chains with Stationary Transition Probabilities , 1961 .

[10] Keith W. Ross,et al. Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[11] Armand M. Makowski,et al. Steering Policies for Markov Decision Processes Under a Recurrence Condition. , 1988 .

[12] Adam Shwartz,et al. Optimal priority assignment: a time sharing approach , 1989 .

[13] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .