Optimal decisions for continuous time Markov decision processes over finite planning horizons

The computation of ź-optimal policies for continuous time Markov decision processes (CTMDPs) over finite time intervals is a sophisticated problem because the optimal policy may change at arbitrary times. Numerical algorithms based on time discretization or uniformization have been proposed for the computation of optimal policies. The uniformization based algorithm has shown to be more reliable and often also more efficient but is currently only available for processes where the gain or reward does not depend on the decision taken in a state. In this paper, we present two new uniformization based algorithms for computing ź-optimal policies for CTMDPs with decision dependent rewards over a finite time horizon. Due to a new and tighter upper bound the newly proposed algorithms cannot only be applied for decision dependent rewards, they also outperform the available approach for rewards that do not depend on the decision. In particular for models where the policy only rarely changes, optimal policies can be computed much faster. HighlightsA new algorithm to compute accumulated rewards for Continuous Time Markov Decision Processes with action dependent rewards over finite horizons.A proof that the algorithm guarantees a global error in O(ź) for time step ź.Experimental comparision of available algorithms to analyze accumulated rewards for Continuous Time Markov Decision Processes with action dependent rewards over finite horizons.

[1]  R. Gallager Stochastic Processes , 2014 .

[2]  Christel Baier,et al.  Model-Checking Algorithms for Continuous-Time Markov Chains , 2002, IEEE Trans. Software Eng..

[3]  B. L. Miller Finite state continuous time Markov decision processes with an infinite planning horizon , 1968 .

[4]  Mor Harchol-Balter,et al.  Optimality analysis of energy-performance trade-off for server farm management , 2010, Perform. Evaluation.

[5]  Mark R. Lembersky On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains , 1974 .

[6]  A. Jensen,et al.  Markoff chains as an aid in the study of Markoff processes , 1953 .

[7]  Lijun Zhang,et al.  Efficient approximation of optimal control for continuous-time Markov games , 2016, Inf. Comput..

[8]  Donald Gross,et al.  The Randomization Technique as a Modeling Tool and Solution Procedure for Transient Markov Processes , 1984, Oper. Res..

[9]  Shalabh Bhatnagar,et al.  Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes , 2008, Simul..

[10]  Lijun Zhang,et al.  Model Checking Algorithms for CTMDPs , 2011, CAV.

[11]  Steven A. Lippman,et al.  Countable-State, Continuous-Time Dynamic Programming with Structure , 1976, Oper. Res..

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  Steven A. Lippman,et al.  Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[14]  Richard F. Serfozo,et al.  Technical Note - An Equivalence Between Continuous and Discrete Time Markov Decision Processes , 1979, Oper. Res..

[15]  R. Serfozo An Equivalence between Continuous and Discrete Time Markov Decision Processes. , 1976 .

[16]  Anders Martin-Löf,et al.  Optimal Control of a Continuous-Time Markov Chain with Periodic Transition Probabilities , 1967, Oper. Res..

[17]  B. L. Miller Finite State Continuous Time Markov Decision Processes with a Finite Planning Horizon , 1968 .

[18]  Christel Baier,et al.  Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes , 2005, Theor. Comput. Sci..

[19]  Kishor S. Trivedi,et al.  Exact Methods for the Transient Analysis of Nonhomogeneous Continuous Time Markov Chains , 1995 .

[20]  Masami Yasuda ON THE EXISTENCE OF OPTIMAL CONTROL IN CONTINUOUS TIME MARKOV DECISION PROCESSES , 1972 .

[21]  Peter Buchholz,et al.  Numerical analysis of continuous time Markov decision processes over finite horizons , 2011, Comput. Oper. Res..

[22]  Peter Buchholz,et al.  On the Numerical Analysis of Inhomogeneous Continuous-Time Markov Chains , 2010, INFORMS J. Comput..