Comparing Policies in Markov Decision Processes: Mandl's Lemma Revisited

A general framework is developed for comparing the long-run average cost of a Markov stationary policy with that of another related policy. The underlying methodology constitutes an extension of some ideas of Mandl to randomized policies, and to Polish state and action spaces. Sufficient conditions for the applicability of the methodology are given. These conditions, which are easy to verify, have a natural probabilistic interpretation in terms of the "stability" of the chain and of the convergence of the control values. The usefulness of the general framework proposed here is illustrated on several applications. Standard results on the convergence of adaptive policies are readily recovered under conditions which are more transparent than the ones existing in the literature, and the convergence of randomized policies is handled as a special case. Finally, a novel application to "probing controls" is outlined.

[1]  A. Skorokhod Limit Theorems for Stochastic Processes , 1956 .

[2]  C. Derman,et al.  A SOLUTION TO A COUNTABLE SYSTEM OF EQUATIONS ARISING IN MARKOVIAN DECISION PROCESSES. , 1966 .

[3]  P. Billingsley,et al.  Convergence of Probability Measures , 1970, The Mathematical Gazette.

[4]  P. Billingsley,et al.  Convergence of Probability Measures , 1969 .

[5]  H. Kushner Introduction to stochastic control , 1971 .

[6]  Arie Hordijk,et al.  Dynamic programming and Markov potential theory , 1974 .

[7]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[8]  Petr Mandl On the adaptive control of countable Markov chains , 1979 .

[9]  P. Hall,et al.  Martingale Limit Theory and Its Application , 1980 .

[10]  J. Baras,et al.  Stability, parameter estimation and adaptive control for discrete-time competing queues , 1984, The 23rd IEEE Conference on Decision and Control.

[11]  M. Metivier,et al.  Applications of a Kushner and Clark lemma to general classes of stochastic algorithms , 1984, IEEE Trans. Inf. Theory.

[12]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[13]  Patchigolla Kiran Kumar,et al.  A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[14]  Armand M. Makowski,et al.  An Optimal Adaptive Scheme for Two Competing Queues with Constraints , 1986 .

[15]  F. Beutler,et al.  Time-average optimal constrained semi-Markov decision processes , 1986, Advances in Applied Probability.

[16]  Keith W. Ross,et al.  Optimal priority assignment with hard constraint , 1986 .

[17]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[18]  Adam Shwartz On convergence of the cost under adaptive policies , 1986, 1986 25th IEEE Conference on Decision and Control.

[19]  A. Makowski,et al.  Parameter estimation under threshold policies for a simple flow control problem , 1987, 26th IEEE Conference on Decision and Control.

[20]  Armand M. Makowski,et al.  Implementation Issues for Markov Decision Processes , 1988 .

[21]  Dye-Jyun Ma A Simple Problem of Flow Control: Optimality and Adaptive Implementations , 1988 .

[22]  A. Shwartz,et al.  Analysis and Adaptive Control of a Discrete-Time Single-Server Network with Random Routing , 1989 .