The Relations Among Potentials, Perturbation Analysis, and Markov Decision Processes

This paper provides an introductory discussion for an important concept, the performance potentials of Markov processes, and its relations with perturbation analysis (PA), average-cost Markov decision processes (MDP), Poisson equations, α-potentials, the fundamental matrix, and the group inverse of the transition matrix (or the infinitesimal generators). Applications to single sample path-based performance sensitivity estimation and performance optimization are also discussed. On-line algorithms for performance sensitivity estimates and on-line schemes for policy iteration methods are presented. The approach is closely related to reinforcement learning algorithms.

[1]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[2]  Erhan Çinlar,et al.  Introduction to stochastic processes , 1974 .

[3]  J. Meyer The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains , 1975 .

[4]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[5]  C. D. Meyer,et al.  Using the QR factorization and group inversion to compute, differentiate ,and estimate the sensitivity of stationary probabilities for markov chains , 1986 .

[6]  Xi-Ren Cao,et al.  Perturbation analysis of discrete event dynamic systems , 1991 .

[7]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[8]  Michael C. Fu,et al.  Smoothed perturbation analysis derivative estimation for Markov chains , 1994, Oper. Res. Lett..

[9]  Xi-Ren Cao,et al.  Realization Probabilities: The Dynamics of Queuing Systems , 1994 .

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  L. Dai A consistent algorithm for derivative estimation of Markov chains , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[12]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[13]  Robert G. Gallager,et al.  Discrete Stochastic Processes , 1995 .

[14]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[15]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[16]  Y. Ho,et al.  Structural infinitesimal perturbation analysis (SIPA) for derivative estimation of discrete-event dynamic systems , 1995, IEEE Trans. Autom. Control..

[17]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[18]  Xi-Ren Cao,et al.  Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..