Permissive Supervisor Synthesis for Markov Decision Processes Through Learning

This paper considers the permissive supervisor synthesis for probabilistic systems modeled as Markov Decision Processes (MDP). Such systems are prevalent in power grids, transportation networks, communication networks, and robotics. We propose a novel supervisor synthesis framework using automata learning and compositional model checking to generate the permissive local supervisors in a distributed manner. With the recent advances in assume-guarantee reasoning verification for MDPs, constructing the composed system can be avoided to alleviate the state space explosion. Our framework learns the supervisors iteratively using counterexamples from the verification and is guaranteed to terminate in finite steps and to be correct.

[1]  Christos G. Cassandras,et al.  Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.

[2]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[3]  Hai Lin,et al.  Privacy Verification and Enforcement via Belief Abstraction , 2018, IEEE Control Systems Letters.

[4]  Marta Z. Kwiatkowska,et al.  Permissive Controller Synthesis for Probabilistic Systems , 2014, TACAS.

[5]  Thomas A. Henzinger,et al.  You Assume, We Guarantee: Methodology and Case Studies , 1998, CAV.

[6]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[7]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[8]  Hai Lin,et al.  Automatic synthesis of cooperative multi-agent systems , 2014, 53rd IEEE Conference on Decision and Control.

[9]  Bengt Jonsson,et al.  A logic for reasoning about time and reliability , 1990, Formal Aspects of Computing.

[10]  Pao-Ann Hsiung,et al.  Counterexample-Guided Assume-Guarantee Synthesis through Learning , 2011, IEEE Transactions on Computers.

[11]  Hai Lin,et al.  A Learning Based Optimal Human Robot Collaboration with Linear Temporal Logic Constraints , 2017, ArXiv.

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  Ufuk Topcu,et al.  Transfer Entropy in MDPs with Temporal Logic Specifications , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[14]  AngluinDana Learning regular sets from queries and counterexamples , 1987 .

[15]  Thomas A. Henzinger,et al.  Abstractions from proofs , 2004, POPL.

[16]  Lijun Zhang,et al.  Learning Weighted Assumptions for Compositional Verification of Markov Decision Processes , 2016, ACM Trans. Softw. Eng. Methodol..

[17]  Jan J. M. M. Rutten,et al.  Mathematical techniques for analyzing concurrent and probabilistic systems , 2004, CRM monograph series.

[18]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[19]  Sebastian Junges,et al.  Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.

[20]  Nils Jansen,et al.  Minimal Critical Subsystems for Discrete-Time Markov Models , 2012, TACAS.

[21]  Ufuk Topcu,et al.  Synthesis of Human-in-the-Loop Control Protocols for Autonomous Systems , 2016, IEEE Transactions on Automation Science and Engineering.

[22]  Edmund M. Clarke,et al.  Assume-Guarantee Abstraction Refinement for Probabilistic Systems , 2012, CAV.

[23]  Nils Jansen,et al.  The COMICS Tool - Computing Minimal Counterexamples for DTMCs , 2012, ATVA.

[24]  Hai Lin,et al.  Counterexample-guided distributed permissive supervisor synthesis for probabilistic multi-agent systems through learning , 2016, 2016 American Control Conference (ACC).

[25]  Lijun Zhang,et al.  Probabilistic CEGAR , 2008, CAV.

[26]  Joost-Pieter Katoen,et al.  Counterexample Generation in Probabilistic Model Checking , 2009, IEEE Transactions on Software Engineering.

[27]  Antonín Kucera,et al.  On the Controller Synthesis for Finite-State Markov Decision Processes , 2005, Fundam. Informaticae.

[28]  Christel Baier,et al.  Principles of model checking , 2008 .

[29]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[30]  Marta Z. Kwiatkowska,et al.  Automated Verification Techniques for Probabilistic Systems , 2011, SFM.

[31]  Mahesh Viswanathan,et al.  A counterexample-guided abstraction-refinement framework for markov decision processes , 2008, TOCL.

[32]  Benedikt Bollig,et al.  libalf: The Automata Learning Framework , 2010, CAV.

[33]  Sanjit A. Seshia,et al.  Reactive synthesis from signal temporal logic specifications , 2015, HSCC.

[34]  Christel Baier,et al.  Controller Synthesis for Probabilistic Systems , 2004, IFIP TCS.

[35]  Nils Jansen,et al.  Counterexample Generation for Discrete-Time Markov Models: An Introductory Survey , 2014, SFM.

[36]  Vincent W. S. Wong,et al.  An MDP-Based Vertical Handoff Decision Algorithm for Heterogeneous Wireless Networks , 2008, IEEE Transactions on Vehicular Technology.

[37]  Jonathan Donadee,et al.  Stochastic Optimization of Grid to Vehicle Frequency Regulation Capacity Bids , 2014, IEEE Transactions on Smart Grid.

[38]  Hai Lin,et al.  Counterexample-guided permissive supervisor synthesis for probabilistic systems through learning , 2015, 2015 American Control Conference (ACC).

[39]  Calin Belta,et al.  Motion planning and control from temporal logic specifications with probabilistic satisfaction guarantees , 2010, 2010 IEEE International Conference on Robotics and Automation.

[40]  Nancy A. Lynch,et al.  Probabilistic Simulations for Probabilistic Processes , 1994, Nord. J. Comput..