Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time Markov decision processes (MDPs). The reduction is based on the equivalence of strategies that change actions between jumps and the randomized strategies that change actions only at jump epochs. This holds both for one-criterion problems and for multiple-objective problems with constraints. In particular, this paper introduces the theory for multiple-objective problems with expected total discounted rewards and constraints. If a problem is feasible, there exist three types of optimal policies: (i) nonrandomized switching stationary policies, (ii) randomized stationary policies for the CTJMDP, and (iii) randomized stationary policies for the corresponding SMDP with exponentially distributed sojourn times, and these policies can be implemented as randomized strategies in the CTJMDP.

[1]  A. A. Yushkevich Controlled Jump Markov Models , 1981 .

[2]  A. Piunovskiy Optimal Control of Random Sequences in Problems with Constraints , 1997 .

[3]  P. Kakumanu Continuously Discounted Markov Decision Model with Countable State and Action Space , 1971 .

[4]  A. Yushkevich On Reducing a Jump Controllable Markov Model to a Model with Discrete Time , 1980 .

[5]  B. L. Miller Finite State Continuous Time Markov Decision Processes with a Finite Planning Horizon , 1968 .

[6]  E. Feinberg Optimal control of average reward constrained continuous-time finite Markov decision processes , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[7]  M. Y. Kitayev Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion , 1986 .

[8]  G. Kallianpur Stochastic Filtering Theory , 1980 .

[9]  A. A. Yushkevich,et al.  Controlled Markov Models with Countable State Space and Continuous Time , 1978 .

[10]  Eugene A. Feinberg,et al.  Constrained Markov Decision Models with Weighted Discounted Rewards , 1995, Math. Oper. Res..

[11]  C. Derman,et al.  A Note on Memoryless Rules for Controlling Sequential Control Processes , 1966 .

[12]  Eugene A. Feinberg,et al.  Constrained Discounted Dynamic Programming , 1996, Math. Oper. Res..

[13]  P. Kakumanu,et al.  Relation between continuous and discrete time markovian decision problems , 1977 .

[14]  N. Schaumberger Generalization , 1989, Whitehead and Philosophy of Education.

[15]  A. B. Piunovskii A Controlled Jump Discounted Model with Constraints , 1998 .

[16]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[17]  V. Rykov,et al.  Controlled Queueing Systems , 1995 .

[18]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[19]  E. Altman Constrained Markov Decision Processes , 1999 .

[20]  K. Parthasarathy PROBABILITY MEASURES IN A METRIC SPACE , 1967 .

[21]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .

[22]  Vivek S. Borkar,et al.  Convex Analytic Methods in Markov Decision Processes , 2002 .

[23]  E. Feinberg,et al.  Nonatomic total rewards Markov decision processes with multiple criteria , 2002 .

[24]  O. Hernández-Lerma,et al.  Envelopes of Sets of Measures, Tightness, and Markov Control Processes , 1999 .

[25]  J. Arthur Stochastic Models in Operations Research, Volume II. Stochastic Optimization (Daniel P. Heyman and Matthew J. Sobel) , 1985 .

[26]  Arie Hordijk,et al.  Discretization procedures for continuous time decision processes , 1979 .

[27]  B. L. Miller Finite state continuous time Markov decision processes with an infinite planning horizon , 1968 .

[28]  J. Jacod Multivariate point processes: predictable projection, Radon-Nikodym derivatives, representation of martingales , 1975 .

[29]  O. Gaans Probability measures on metric spaces , 2022 .

[30]  L. Sennott Stochastic Dynamic Programming and the Control of Queueing Systems , 1998 .

[31]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[32]  Onésimo Hernández-Lerma,et al.  Constrained Markov control processes in Borel spaces: the discounted case , 2000, Math. Methods Oper. Res..

[33]  Steven A. Lippman,et al.  Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[34]  Richard F. Serfozo,et al.  Technical Note - An Equivalence Between Continuous and Discrete Time Markov Decision Processes , 1979, Oper. Res..

[35]  Eugene A. Feinberg A GENERALIZATION OF 'EXPECTATION EQUALS RECIPROCAL OF INTENSITY' TO NON-STATIONARY EXPONENTIAL DISTRIBUTIONS , 1994 .

[36]  E. Fainberg,et al.  On Homogeneous Markov Models with Continuous Time and Finite or Countable State Space , 1979 .

[37]  Christos G. Cassandras,et al.  Discrete-Event Systems , 2005, Handbook of Networked and Embedded Control Systems.

[38]  J. Neveu,et al.  Mathematical foundations of the calculus of probability , 1965 .

[39]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[40]  Eugene A. Feinberg,et al.  Constrained Discounted Semi-Markov Decision Processes , 2002 .