An Approximation Approach for the Deviation Matrix of Continuous-Time Markov Processes with Application to Markov Decision Theory

We present an update formula that allows the expression of the deviation matrix of a continuous-time Markov process with denumerable state space having generator matrix Q* through a continuous-time Markov process with generator matrix Q. We show that under suitable stability conditions the algorithm converges at a geometric rate. By applying the concept to three different examples, namely, the M/M/1 queue with vacations, the M/G/1 queue, and a tandem network, we illustrate the broad applicability of our approach. For a problem in admission control, we apply our approximation algorithm to Markov decision theory for computing the optimal control policy. Numerical examples are presented to highlight the efficiency of the proposed algorithm.

[1]  Jr. Shaler Stidham Optimal control of admission to a queueing system , 1985 .

[2]  Rommert Dekker,et al.  Recurrence Conditions for Average and Blackwell Optimality in Denumerable State Markov Decision Chains , 1992, Math. Oper. Res..

[3]  Xi-Ren Cao,et al.  Optimal Control of Ergodic Continuous-Time Markov Chains with Average Sample-Path Rewards , 2005, SIAM J. Control. Optim..

[4]  Xi-Ren Cao,et al.  Stochastic Learning and Optimization: A Sensitivity-Based Approach (International Series on Discrete Event Dynamic Systems) , 2007 .

[5]  Eitan Altman,et al.  Zero-sum Markov games and worst-case optimal control of queueing systems , 1995, Queueing Syst. Theory Appl..

[6]  Eitan Altman,et al.  Discrete-Event Control of Stochastic Networks - Multimodularity and Regularity , 2004, Lecture notes in mathematics.

[7]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[8]  Xianping Guo,et al.  Continuous-Time Controlled Markov Chains with Discounted Rewards , 2003 .

[9]  Eugene A. Feinberg,et al.  Handbook of Markov Decision Processes , 2002 .

[10]  A. Hordijk,et al.  Constrained admission control to a queueing system , 1989, Advances in Applied Probability.

[11]  John Bather OPTIMAL STATIONARY POLICIES FOR DENUMERABLE MARKOV CHAINS IN CONTINUOUS TIME , 1976 .

[12]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[13]  Alma Riska,et al.  M/G/1-Type Markov Processes: A Tutorial , 2002, Performance.

[14]  Xi Chen,et al.  Policy iteration based feedback control , 2008, Autom..

[15]  Xiuli Chao,et al.  Analysis of multi-server queues with station and server vacations , 1998, Eur. J. Oper. Res..

[16]  Rommert Dekker,et al.  On the Relation Between Recurrence and Ergodicity Properties in Denumerable Markov Decision Chains , 1994, Math. Oper. Res..

[17]  W. D. Ray,et al.  Stochastic Models: An Algorithmic Approach , 1995 .

[18]  Avishai Mandelbaum,et al.  Statistical Analysis of a Telephone Call Center , 2005 .

[19]  P. Schweitzer Perturbation theory and finite Markov chains , 1968 .

[20]  Arie Hordijk,et al.  Series Expansions for Continuous-Time Markov Processes , 2010, Oper. Res..

[21]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .

[22]  A. Hordijk,et al.  Contraction Conditions for Average and α-Discount Optimality in Countable State Markov Games with Unbounded Rewards , 1997, Math. Oper. Res..

[23]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[24]  Xianping Guo,et al.  A note on optimality conditions for continuous-time Markov decision processes with average cost criterion , 2001, IEEE Trans. Autom. Control..

[25]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[26]  Sلأren Asmussen,et al.  Applied Probability and Queues , 1989 .

[27]  Xianping Guo,et al.  Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces , 2007, Math. Oper. Res..

[28]  Marcel F. Neuts,et al.  Matrix-geometric solutions in stochastic models - an algorithmic approach , 1982 .

[29]  Oliver C. Ibe,et al.  Markov processes for stochastic modeling , 2008 .

[30]  Richard F. Serfozo,et al.  Technical Note - An Equivalence Between Continuous and Discrete Time Markov Decision Processes , 1979, Oper. Res..

[31]  Arie Hordijk,et al.  Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards , 1988, Math. Oper. Res..

[32]  Arie Hordijk,et al.  Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards , 1999, Math. Methods Oper. Res..

[33]  A. Hordijk,et al.  SERIES EXPANSIONS FOR FINITE-STATE MARKOV CHAINS , 2007, Probability in the Engineering and Informational Sciences.

[34]  Rommert Dekker,et al.  Denumerable semi-Markov decision chains with small interest rates , 1991 .

[35]  A. W. Kemp,et al.  Applied Probability and Queues , 1989 .

[36]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[37]  Xi-Ren Cao,et al.  A survey of recent results on continuous-time Markov decision processes , 2006 .

[38]  A. Hordijk,et al.  Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model , 1983, Advances in Applied Probability.

[39]  A. A. Yushkevich,et al.  Controlled Markov Models with Countable State Space and Continuous Time , 1978 .

[40]  Pauline Coolen-Schrijner,et al.  THE DEVIATION MATRIX OF A CONTINUOUS-TIME MARKOV CHAIN , 2002, Probability in the Engineering and Informational Sciences.

[41]  Eitan Altman,et al.  Applications of Markov Decision Processes in Communication Networks , 2000 .

[42]  G. Koole The deviation matrix of the M/M/1//spl infin/ and M/M/1/N queue, with applications to controlled queueing models , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[43]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..

[44]  P. Kakumanu,et al.  Continuous time Markovian decision processes average return criterion , 1975 .