Learning based supervisor synthesis of POMDP for PCTL specifications

Partially Observable Markov Decision Process (POMDP) has been widely used in the robotics to model uncertainties from sensors, actuators and the environment. However, such comprehensiveness makes the planning in POMDP generally very difficult. Existing work often searches for an optimal control policy with respect to predefined reward functions, which may require a large memory and is computationally expensive. We propose to use formal methods and learn a Deterministic Finite Automaton (DFA) as a supervisor to regulate the behavior of a Partially Observable Markov Decision Process (POMDP), such that it satisfies the given specification in Probabilistic Computation Tree Logic (PCTL). For such a purpose, we modify the L* learning algorithm and define oracles for membership queries and conjectures. We further show that the termination and correctness of the design algorithm are guaranteed. A simple example is used for illustration.

[1]  Hai Lin,et al.  Counterexample-guided permissive supervisor synthesis for probabilistic systems through learning , 2015, 2015 American Control Conference (ACC).

[2]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[3]  Vijay Kumar,et al.  Cooperative air and ground surveillance , 2006, IEEE Robotics & Automation Magazine.

[4]  Shlomo Zilberstein,et al.  Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs , 2010, Autonomous Agents and Multi-Agent Systems.

[5]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[6]  Jan J. M. M. Rutten,et al.  Mathematical techniques for analyzing concurrent and probabilistic systems , 2004, CRM monograph series.

[7]  Edmund M. Clarke,et al.  Design and Synthesis of Synchronization Skeletons Using Branching-Time Temporal Logic , 1981, Logic of Programs.

[8]  Benedikt Bollig,et al.  libalf: The Automata Learning Framework , 2010, CAV.

[9]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[10]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[11]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[12]  Christel Baier,et al.  Principles of model checking , 2008 .

[13]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[14]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[15]  Edwin Olson,et al.  Exploration and mapping with autonomous robot teams , 2013, CACM.

[16]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[17]  Rangoli Sharan,et al.  Formal Methods for Control Synthesis in Partially Observed Environments: Application to Autonomous Robotic Manipulation , 2014 .

[18]  Vijay K. Garg,et al.  Modeling and Control of Logical Discrete Event Systems , 1994 .

[19]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[20]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..