Bounded Aggregation for Continuous Time Markov Decision Processes

Markov decision processes suffer from two problems, namely the so-called state space explosion which may lead to long computation times and the memoryless property of states which limits the modeling power with respect to real systems. In this paper we combine existing state aggregation and optimization methods for a new aggregation based optimization method. More specifically, we compute reward bounds on an aggregated model by exchanging state space size with uncertainty. We propose an approach for continuous time Markov decision models with discounted or average reward measures.

[1]  Lijun Zhang,et al.  Model Checking Algorithms for CTMDPs , 2011, CAV.

[2]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[3]  Benjamin Van Roy Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..

[4]  G. Franceschinis,et al.  Bounds for Quasi-Lumpable Markow Chains , 1994, Perform. Evaluation.

[5]  Pierre Semal Refinable Bounds for Large Markov Chains , 1995, IEEE Trans. Computers.

[6]  Peter Buchholz,et al.  Analysis of Markov Decision Processes Under Parameter Uncertainty , 2017, EPEW.

[7]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[8]  B. Krogh,et al.  State aggregation in Markov decision processes , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[9]  Alessandro Abate,et al.  Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations , 2016, ATVA.

[10]  Peter Buchholz,et al.  Input Modeling with Phase-Type Distributions and Markov Models: Theory and Applications , 2014 .

[11]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[12]  Pierre Semal,et al.  Bounds for the Positive Eigenvectors of Nonnegative Matrices and for their Approximations by Decomposition , 1984, JACM.

[13]  Keith W. Ross,et al.  Uniformization for semi-Markov decision processes under stationary policies , 1987, Journal of Applied Probability.

[14]  Ambuj Tewari,et al.  Bounded Parameter Markov Decision Processes with Average Reward Criterion , 2007, COLT.

[15]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16]  Richard F. Serfozo,et al.  Technical Note - An Equivalence Between Continuous and Discrete Time Markov Decision Processes , 1979, Oper. Res..

[17]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[18]  Peter Buchholz,et al.  Bounding reward measures of Markov models using the Markov decision processes , 2011, Numer. Linear Algebra Appl..

[19]  R. Serfozo An Equivalence between Continuous and Discrete Time Markov Decision Processes. , 1976 .