An Anytime Algorithm for Reachability on Uncountable MDP.

We provide an algorithm for reachability on Markov decision processes with uncountable state and action spaces, which, under mild assumptions, approximates the optimal value to any desired precision. It is the first such anytime algorithm, meaning that at any point in time it can return the current approximation with its precision. Moreover, it simultaneously is the first algorithm able to utilize \emph{learning} approaches without sacrificing guarantees and it further allows for combination with existing heuristics.

[1]  Krishnendu Chatterjee,et al.  Verification of Markov Decision Processes Using Learning Algorithms , 2014, ATVA.

[2]  Scott Sanner,et al.  Symbolic Dynamic Programming for Discrete and Continuous State MDPs , 2011, UAI.

[3]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Sofie Haesaert,et al.  Temporal logic control of general Markov decision processes by approximate policy refinement , 2017, ADHS.

[6]  Daniel Kroening,et al.  Certified Reinforcement Learning with Logic Guidance , 2019, Artif. Intell..

[7]  Masoud Mahootchi Storage System Management Using Reinforcement Learning Techniques and Nonlinear Models , 2009 .

[8]  Lihong Li,et al.  Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[9]  Christel Baier,et al.  Principles of model checking , 2008 .

[10]  Benjamin Monmege,et al.  Interval iteration algorithm for MDPs and IMDPs , 2017, Theor. Comput. Sci..

[11]  P. Moerbeke On optimal stopping and free boundary problems , 1973, Advances in Applied Probability.

[12]  Milos Hauskrecht,et al.  Solving Factored MDPs with Continuous and Discrete Variables , 2004, UAI.

[13]  John Lygeros,et al.  Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems , 2008, Autom..

[14]  Geoffrey J. Gordon,et al.  Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[15]  Abdeslem Boukhtouta,et al.  Water Reservoir Applications of Markov Decision Processes , 2002 .

[16]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[17]  Krishnendu Chatterjee,et al.  Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes , 2015, 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science.

[18]  Kim G. Larsen,et al.  Teaching Stratego to Play Ball: Optimal Synthesis for Continuous Space MDPs , 2019, ATVA.

[19]  John Lygeros,et al.  Maximizing the probability of attaining a target prior to extinction , 2009, 0904.4143.

[20]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[21]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[22]  Rahul Jain,et al.  A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs , 2020, IEEE Transactions on Automatic Control.

[23]  Scott Sanner,et al.  Continuous Real Time Dynamic Programming for Discrete and Continuous State MDPs , 2014, 2014 Brazilian Conference on Intelligent Systems.

[24]  Sean P. Meyn,et al.  An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[25]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[26]  Sofie Haesaert,et al.  Verification of General Markov Decision Processes by Approximate Similarity Relations and Policy Refinement , 2016, QEST.

[27]  Zhengzhu Feng,et al.  Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[28]  David E. Smith,et al.  Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.

[29]  Daniel Kroening,et al.  Logically-Constrained Neural Fitted Q-Iteration , 2018, AAMAS.

[30]  Jan Kretínský,et al.  Of Cores: A Partial-Exploration Framework for Markov Decision Processes , 2019, CONCUR.

[31]  Fred Kröger,et al.  Temporal Logic of Programs , 1987, EATCS Monographs on Theoretical Computer Science.

[32]  Pierre Wolper,et al.  Automata theoretic techniques for modal logics of programs: (Extended abstract) , 1984, STOC '84.

[33]  Hiteshi Sharma,et al.  Approximate Relative Value Learning for Average-reward Continuous State MDPs , 2019, UAI.

[34]  Victor Uc Cetina,et al.  Reinforcement learning in continuous state and action spaces , 2009 .

[35]  Joost-Pieter Katoen,et al.  Quantitative model-checking of controlled discrete-time Markov processes , 2014, Inf. Comput..

[36]  Joost-Pieter Katoen,et al.  Quantitative automata-based controller synthesis for non-autonomous stochastic hybrid systems , 2013, HSCC '13.

[37]  Marta Z. Kwiatkowska,et al.  Automated Verification Techniques for Probabilistic Systems , 2011, SFM.

[38]  John Lygeros,et al.  Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem , 2010, Autom..