Algorithms for optimization and stabilization of controlled Markov chains

AbstractThis article reviews some recent results by the author on the optimal control of Markov chains. Two common algorithms for the construction of optimal policies are considered: value iteration and policy iteration.In either case, it is found that the following hold when the algorithm is properly initialized:(i)A stochastic Lyapunov function exists for each intermediate policy, and hence each policy isregular (a strong stability condition).(ii)Intermediate costs converge to the optimal cost.(iii)Any limiting policy is average cost optimal. The network scheduling problem is considered in some detail as both an illustration of the theory, and because of the strong conclusions which can be reached for this important example as an application of the general theory.

[1]  E. Nummelin General irreducible Markov chains and non-negative operators: List of symbols and notation , 1984 .

[2]  P. R. Kumar,et al.  Dynamic instabilities and stabilization methods in distributed real-time scheduling of manufacturing systems , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[3]  P. Glynn A Lyapunov Bound for Solutions of Poisson's Equation , 1989 .

[4]  V. Borkar Topics in controlled Markov chains , 1991 .

[5]  A. N. Rybko,et al.  On the ergodicity of stochastic processes describing functioning of open queueing networks , 1992 .

[6]  Sean P. Meyn,et al.  Generalized Resolvents and Harris Recurrence of Markov Processes , 1992 .

[7]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[8]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Sean P. Meyn,et al.  Duality and linear programs for stability and performance analysis of queueing networks and scheduling policies , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[11]  S. Meyn,et al.  Exponential and Uniform Ergodicity of Markov Processes , 1995 .

[12]  J. Dai On Positive Harris Recurrence of Multiclass Queueing Networks: A Unified Approach Via Fluid Limit Models , 1995 .

[13]  Sean P. Meyn,et al.  Duality and linear programs for stability and performance analysis of queuing networks and scheduling policies , 1996, IEEE Trans. Autom. Control..

[14]  Sean P. Meyn,et al.  Fluid Network Models: Linear Programs for Control and Performance Bounds , 1996 .

[15]  R. Cavazos-Cadena Value Iteration in a Class of Communicating Markov Decision Chains with the Average Cost Criterion , 1996 .

[16]  Sean P. Meyn,et al.  A Liapounov bound for solutions of the Poisson equation , 1996 .

[17]  Sean P. Meyn The Policy Improvement Algorithm for Markov Decision Processes , 1997 .

[18]  Sean P. Meyn The policy iteration algorithm for average reward Markov decision processes with general state space , 1997, IEEE Trans. Autom. Control..

[19]  Xi-Ren Cao,et al.  The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes , 1998, Discret. Event Dyn. Syst..

[20]  Sean P. Meyn,et al.  Value iteration and optimization of multiclass queueing networks , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[21]  S. P. Meynz,et al.  Risk Sensitive Optimal Control: Existence and Synthesis for Models with Unbounded Cost , 1999 .

[22]  Ann Appl,et al.  On the Positive Harris Recurrence for Multiclass Queueing Networks: a Uniied Approach via Uid Limit Models , 1999 .