Monte-Carlo Policy Synthesis in POMDPs with Quantitative and Qualitative Objectives

Autonomous robots operating in uncertain environments often face the problem of planning under a mix of formal, qualitative requirements, for example the assertion that the robot reaches a goal location safely, and optimality criteria, for example that the path to the goal is as short or energy-efficient as possible. Such problems can be modeled as Partially Observable Markov Decision Processes (POMDPs) with quantitative and qualitative objectives. In this paper, we present a new policy synthesis algorithm, called Policy Synthesis with Statistical Model Checking (PO-SMC), for such POMDPs. While previous policy synthesis approaches for this setting use symbolic tools (for example satisfiability solvers) to meet the qualitative requirements, our approach is based on Monte Carlo sampling and uses statistical model checking to ensure that the qualitative requirements are satisfied with high confidence. An appeal of statistical model checking is that it can handle rich temporal requirements such as safe-reachability, while being far more scalable than symbolic methods. The safe-reachability property combines the safety and reachability requirements as a single qualitative requirement. While our use of sampling introduces approximations that symbolic approaches do not require, we present theoretical results that show that the error due to approximation is bounded. Our experimental results demonstrate that PO-SMC consistently performs orders of magnitude faster than existing symbolic methods for policy synthesis under qualitative and quantitative requirements.

[1]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Ufuk Topcu,et al.  Sensor Synthesis for POMDPs with Reachability Objectives , 2017, ICAPS.

[3]  Petter Nilsson,et al.  Temporal Logic Control of POMDPs via Label-based Stochastic Simulation Relations , 2018, ADHS.

[4]  David Hsu,et al.  Integrated Perception and Planning in the Continuous Space: A POMDP Approach , 2013, Robotics: Science and Systems.

[5]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[6]  Kee-Eung Kim,et al.  Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes , 2015, AAAI.

[7]  Scott Niekum,et al.  Efficient Hierarchical Robot Motion Planning Under Uncertainty and Hybrid Dynamics , 2018, CoRL.

[8]  Krishnendu Chatterjee,et al.  Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-sum Objectives , 2018, IJCAI.

[9]  Calin Belta,et al.  Control in belief space with Temporal Logic specifications , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[10]  Swarat Chaudhuri,et al.  Point-Based Policy Synthesis for POMDPs With Boolean and Quantitative Objectives , 2019, IEEE Robotics and Automation Letters.

[11]  Swarat Chaudhuri,et al.  Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives , 2018, AAMAS.

[12]  Gethin Norman,et al.  Verification and control of partially observable probabilistic systems , 2017, Real-Time Systems.

[13]  Vijay Kumar,et al.  Stochastic 2-D Motion Planning with a POMDP Framework , 2018, ArXiv.

[14]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[15]  Pradeep Varakantham,et al.  Risk-sensitive planning in partially observable environments , 2010, AAMAS.

[16]  Shimon Whiteson,et al.  Point-Based Planning for Multi-Objective POMDPs , 2015, IJCAI.

[17]  Axel Legay,et al.  Estimating Rewards & Rare Events in Nondeterministic Systems , 2015, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[18]  Sylvie Thiébaux,et al.  RAO*: An Algorithm for Chance-Constrained POMDP's , 2016, AAAI.

[19]  Richard Lassaigne,et al.  Approximate planning and verification for large Markov decision processes , 2012, SAC '12.

[20]  Hai Lin,et al.  Supervisor Synthesis of POMDP based on Automata Learning , 2017, ArXiv.

[21]  Nicholas Roy,et al.  Unmanned Aircraft Collision Avoidance Using Continuous-State POMDPs , 2012 .

[22]  Mahesh Viswanathan,et al.  VESTA: A statistical model-checker and analyzer for probabilistic systems , 2005, Second International Conference on the Quantitative Evaluation of Systems (QEST'05).

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Axel Legay,et al.  Statistical Model Checking: An Overview , 2010, RV.

[25]  David Hsu,et al.  Importance sampling for online planning under uncertainty , 2018, Int. J. Robotics Res..

[26]  Brahim Chaib-draa,et al.  AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs , 2007, IJCAI.

[27]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[28]  Krishnendu Chatterjee,et al.  A Symbolic SAT-Based Algorithm for Almost-Sure Reachability with Small Strategies in POMDPs , 2015, AAAI.

[29]  Sebastian Junges,et al.  Permissive Finite-State Controllers of POMDPs using Parameter Synthesis , 2017, ArXiv.

[30]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[31]  A. Abate,et al.  Multiobjective Optimal Control With Safety as a Priority , 2017, IEEE Transactions on Control Systems Technology.