Strengthening Deterministic Policies for POMDPs

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that satisfies a given specification. Such policies have to take the full execution history of a POMDP into account, rendering the problem undecidable in general. A common approach is to use a limited amount of memory and randomize over potential choices. Yet, this problem is still NP-hard and often computationally intractable in practice. A restricted problem is to use neither history nor randomization, yielding policies that are called stationary and deterministic. Previous approaches to compute such policies employ mixed-integer linear programming (MILP). We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. It is able to handle an arbitrary number of such specifications. Yet, randomization and memory are often mandatory to achieve satisfactory policies. First, we extend our encoding to deliver a restricted class of randomized policies. Second, based on the results of the original MILP, we employ a preprocessing of the POMDP to encompass memory-based decisions. The advantages of our approach over state-of-the-art POMDP solvers lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications. The latter point allows taking trade-offs between performance and safety aspects of typical POMDP examples into account. We show the effectiveness of our method on a broad range of benchmarks.

[1]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[2]  Alvaro Velasquez,et al.  Steady-State Policy Synthesis for Verifiable Control , 2019, IJCAI.

[3]  Shlomo Zilberstein,et al.  Dual Formulations for Optimizing Dec-POMDP Controllers , 2016, ICAPS.

[4]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[5]  Swarat Chaudhuri,et al.  Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives , 2018, AAMAS.

[6]  Sebastian Junges,et al.  Synthesis in pMDPs: A Tale of 1001 Parameters , 2018, ATVA.

[7]  Christel Baier,et al.  Trade-off analysis meets probabilistic model checking , 2014, CSL-LICS.

[8]  Emilio Frazzoli,et al.  Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[9]  Ufuk Topcu,et al.  Environment-Independent Task Specifications via GLTL , 2017, ArXiv.

[10]  Shlomo Zilberstein,et al.  Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs , 2010, Autonomous Agents and Multi-Agent Systems.

[11]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[12]  François Charpillet,et al.  Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps , 2007, ICAPS.

[13]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[14]  Christel Baier,et al.  Principles of model checking , 2008 .

[15]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[16]  Fred Kröger,et al.  Temporal Logic of Programs , 1987, EATCS Monographs on Theoretical Computer Science.

[17]  David Barber,et al.  On the Computational Complexity of Stochastic Controller Optimization in POMDPs , 2011, TOCT.

[18]  Gethin Norman,et al.  Verification and control of partially observable probabilistic systems , 2017, Real-Time Systems.

[19]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[20]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[21]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[22]  Sebastian Junges,et al.  Permissive Finite-State Controllers of POMDPs using Parameter Synthesis , 2017, ArXiv.

[23]  Krishnendu Chatterjee,et al.  Qualitative analysis of POMDPs with temporal logic specifications for robotics applications , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[25]  Sebastian Junges,et al.  Sequential Convex Programming for the Efficient Verification of Parametric MDPs , 2017, TACAS.

[26]  Lijun Zhang,et al.  Probabilistic Reachability for Parametric Markov Models , 2009, SPIN.

[27]  Nils Jansen,et al.  Fast Debugging of PRISM Models , 2014, ATVA.

[28]  Nils Jansen,et al.  Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks , 2019, IJCAI.

[29]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[30]  Petter Nilsson,et al.  Temporal Logic Control of POMDPs via Label-based Stochastic Simulation Relations , 2018, ADHS.

[31]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[32]  Sebastian Junges,et al.  Parameter Synthesis for Markov Models , 2019, Formal Methods Syst. Des..

[33]  Krishnendu Chatterjee,et al.  Optimal cost almost-sure reachability in POMDPs , 2014, Artif. Intell..

[34]  Kousha Etessami,et al.  Multi-Objective Model Checking of Markov Decision Processes , 2007, Log. Methods Comput. Sci..

[35]  Krishnendu Chatterjee,et al.  Trading memory for randomness , 2004, First International Conference on the Quantitative Evaluation of Systems, 2004. QEST 2004. Proceedings..

[36]  Sebastian Junges,et al.  Motion planning under partial observability using game-based abstraction , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[37]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[38]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[39]  D. Braziunas POMDP solution methods , 2003 .

[40]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[41]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[42]  Nils Jansen,et al.  Minimal counterexamples for linear-time probabilistic verification , 2014, Theor. Comput. Sci..