Approximate planning and verification for large Markov decision processes

We focus on the planning and verification problems for very large probabilistic systems, such as Markov decision processes (MDPs), from a complexity point of view. More precisely, we deal with the problem of designing an efficient approximation method to compute a near-optimal policy for the planning problem in discounted MDPs and the satisfaction probabilities of interesting properties, like reachability or safety, over the Markov chain obtained by restricting the MDP to the near-optimal policy. In this paper, we present two different approaches. The first one is based on sparse sampling while the second uses a variant of the multiplicative weights update algorithm. The complexity of the first approximation method is independent of the size of the state space and uses only a probabilistic generator of the MDP. We give a complete analysis of this approach, for which the control parameter is mainly the targeted quality of the approximation. The second approach is more prospective and is different in the sense that the method can be controlled dynamically by observing its speed of convergence. Parts of this paper have already been presented in Lassaigne and Peyronnet (in Proceedings of the ACM Symposium on applied computing, SAC 2012, pp 1314–1319, ACM 2012), by the same authors.

[1]  Thomas A. Henzinger,et al.  Reactive Modules , 1999, Formal Methods Syst. Des..

[2]  Mihalis Yannakakis,et al.  The complexity of probabilistic verification , 1995, JACM.

[3]  Thomas Hérault,et al.  APMC 3.0: Approximate Verification of Discrete and Continuous Time Markov Chains , 2006, Third International Conference on the Quantitative Evaluation of Systems - (QEST'06).

[4]  John Fearnley,et al.  Exponential Lower Bounds for Policy Iteration , 2010, ICALP.

[5]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[6]  Thomas Hérault,et al.  Distribution, Approximation and Probabilistic Model Checking , 2006, PDMC@ICALP.

[7]  Andrew Hinton,et al.  PRISM: A Tool for Automatic Verification of Probabilistic Systems , 2006, TACAS.

[8]  Khaled Hamidouche,et al.  Three High Performance Architectures in the Parallel APMC Boat , 2010, 2010 Ninth International Workshop on Parallel and Distributed Methods in Verification, and Second International Workshop on High Performance Computational Systems Biology.

[9]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[10]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11]  Edmund M. Clarke,et al.  Statistical Model Checking for Markov Decision Processes , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.

[12]  Richard Lassaigne,et al.  Probabilistic verification and approximation , 2008, Ann. Pure Appl. Log..

[13]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[14]  Marta Z. Kwiatkowska,et al.  PRISM 2.0: a tool for probabilistic model checking , 2004, First International Conference on the Quantitative Evaluation of Systems, 2004. QEST 2004. Proceedings..

[15]  Richard Lassaigne,et al.  Approximate planning and verification for large markov decision processes , 2012, SAC.

[16]  Axel Legay,et al.  Lightweight Monte Carlo Algorithm for Markov Decision Processes , 2013, ArXiv.

[17]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[18]  Steven I. Marcus,et al.  A survey of some simulation-based algorithms for Markov decision processes , 2007, Commun. Inf. Syst..

[19]  Oliver Friedmann,et al.  An Exponential Lower Bound for the Parity Game Strategy Improvement Algorithm as We Know it , 2009, 2009 24th Annual IEEE Symposium on Logic In Computer Science.

[20]  Nancy A. Lynch,et al.  Probabilistic Simulations for Probabilistic Processes , 1994, Nord. J. Comput..

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  Andrea Bianco,et al.  Model Checking of Probabalistic and Nondeterministic Systems , 1995, FSTTCS.

[23]  D.A. Castanon,et al.  Rollout Algorithms for Stochastic Scheduling Problems , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[24]  Luca de Alfaro,et al.  Symbolic Model Checking of Probabilistic Processes Using MTBDDs and the Kronecker Representation , 2000, TACAS.

[25]  Moshe Y. Vardi Automatic verification of probabilistic concurrent finite state programs , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[26]  Frédéric Magniez,et al.  Approximate Satisfiability and Equivalence , 2006, 21st Annual IEEE Symposium on Logic in Computer Science (LICS'06).

[27]  Peter Bro Miltersen,et al.  Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.

[28]  Thomas Hérault,et al.  Approximate Probabilistic Model Checking , 2004, VMCAI.

[29]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[30]  Yinyu Ye,et al.  A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..

[31]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[32]  Michel de Rougemont,et al.  Statistic Analysis for Probabilistic Processes , 2009, 2009 24th Annual IEEE Symposium on Logic In Computer Science.

[33]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.