Lightweight Monte Carlo Algorithm for Markov Decision Processes

Markov decision processes are widely used to model optimisation problems and concurrent systems, but only relatively small models may be solved exactly, due to the typically intractable number of states of a system. By considering schedulers based on the states visited by simulations, algorithms exist to find approximate solutions, but the number of states visited also becomes rapidly intractable. We present a lightweight Monte Carlo algorithm that may be used for statistical model checking Markov decision processes and other models that mix nondeterminism with probabilistic transitions. The algorithm uses an O(1) memory representation of general schedulers, based on pseudo-random number generators and hash functions, and may be efficiently parallelised. We provide confidence bounds and propose novel ways in which the algorithm may be profitably extended.

[1]  Andrea Bianco,et al.  Model Checking of Probabalistic and Nondeterministic Systems , 1995, FSTTCS.

[2]  Holger Hermanns,et al.  Partial Order Methods for Statistical Model Checking and Simulation , 2011, FMOODS/FORTE.

[3]  W. G. Horner,et al.  A new method of solving numerical equations of all orders, by continuous approximation , 1815 .

[4]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[5]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[6]  M. Okamoto Some inequalities relating to the partial sum of binomial probabilities , 1959 .

[7]  Christel Baier,et al.  Principles of model checking , 2008 .

[8]  R. Bellman Dynamic programming. , 1957, Science.

[9]  Edmund M. Clarke,et al.  Statistical Model Checking for Markov Decision Processes , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.

[10]  Cyrille Jégourel,et al.  Importance Splitting for Statistical Model Checking Rare Properties , 2013, CAV.

[11]  Steven I. Marcus,et al.  A survey of some simulation-based algorithms for Markov decision processes , 2007, Commun. Inf. Syst..

[12]  Pierre L'Ecuyer,et al.  Improved long-period generators based on linear recurrences modulo 2 , 2004, TOMS.

[13]  Richard Lassaigne,et al.  Approximate planning and verification for large markov decision processes , 2012, SAC.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[16]  D. J. White,et al.  A Survey of Applications of Markov Decision Processes , 1993 .

[17]  Bengt Jonsson,et al.  A logic for reasoning about time and reliability , 1990, Formal Aspects of Computing.

[18]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .