Numerical analysis of continuous time Markov decision processes over finite horizons

Continuous time Markov decision processes (CTMDPs) with a finite state and action space have been considered for a long time. It is known that under fairly general conditions the reward gained over a finite horizon can be maximized by a so-called piecewise constant policy which changes only finitely often in a finite interval. Although this result is available for more than 30 years, numerical analysis approaches to compute the optimal policy and reward are restricted to discretization methods which are known to converge to the true solution if the discretization step goes to zero. In this paper, we present a new method that is based on uniformization of the CTMDP and allows one to compute an @e-optimal policy up to a predefined precision in a numerically stable way using adaptive time steps.

[1]  Micha Yadin,et al.  Randomization Procedures in the Computation of Cumulative-Time Distributions over Discrete State Markov Processes , 1984, Oper. Res..

[2]  Masami Yasuda ON THE EXISTENCE OF OPTIMAL CONTROL IN CONTINUOUS TIME MARKOV DECISION PROCESSES , 1972 .

[3]  Kishor S. Trivedi,et al.  Exact Methods for the Transient Analysis of Nonhomogeneous Continuous Time Markov Chains , 1995 .

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  J. Hillston Compositional Markovian Modelling Using a Process Algebra , 1995 .

[6]  Anders Martin-Löf,et al.  Optimal Control of a Continuous-Time Markov Chain with Periodic Transition Probabilities , 1967, Oper. Res..

[7]  B. L. Miller Finite State Continuous Time Markov Decision Processes with a Finite Planning Horizon , 1968 .

[8]  Christel Baier,et al.  Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes , 2005, Theor. Comput. Sci..

[9]  Q. Hu,et al.  Markov decision processes with their applications , 2007 .

[10]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[12]  Steven A. Lippman,et al.  Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[13]  Richard F. Serfozo,et al.  Technical Note - An Equivalence Between Continuous and Discrete Time Markov Decision Processes , 1979, Oper. Res..

[14]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[15]  Steven A. Lippman,et al.  Countable-State, Continuous-Time Dynamic Programming with Structure , 1976, Oper. Res..

[16]  Keith W. Ross,et al.  Uniformization for semi-Markov decision processes under stationary policies , 1987, Journal of Applied Probability.

[17]  Donald Gross,et al.  The Randomization Technique as a Modeling Tool and Solution Procedure for Transient Markov Processes , 1984, Oper. Res..

[18]  Shalabh Bhatnagar,et al.  Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes , 2008, Simul..

[19]  Mark R. Lembersky On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains , 1974 .

[20]  A. Jensen,et al.  Markoff chains as an aid in the study of Markoff processes , 1953 .

[21]  Peter Buchholz,et al.  On the Numerical Analysis of Inhomogeneous Continuous-Time Markov Chains , 2010, INFORMS J. Comput..