Reinforcement Learning with Quantitative Verification for Assured Multi-Agent Policies

In multi-agent reinforcement learning, several agents converge together towards optimal policies that solve complex decision-making problems. This convergence process is inherently stochastic, meaning that its use in safety-critical domains can be problematic. To address this issue, we introduce a new approach that combines multi-agent reinforcement learning with a formal verification technique termed quantitative verification. Our assured multi-agent reinforcement learning approach constrains agent behaviours in ways that ensure the satisfaction of requirements associated with the safety, reliability, and other non-functional aspects of the decision-making problem being solved. The approach comprises three stages. First, it models the problem as an abstract Markov decision process, allowing quantitative verification to be applied. Next, this abstract model is used to synthesise a policy which satisfies safety, reliability, and performance constraints. Finally, the synthesised policy is used to constrain agent behaviour within the low-level problem with a greatly lowered risk of constraint violations. We demonstrate our approach using a safety-critical multi-agent patrolling problem.

[1]  Mladen Kolar,et al.  Convergent Policy Optimization for Safe Reinforcement Learning , 2019, NeurIPS.

[2]  Vijay Kumar,et al.  A Multi-robot Control Policy for Information Gathering in the Presence of Unknown Hazards , 2011, ISRR.

[3]  Jianbin Qiu,et al.  A novel approach to coordination of multiple robots with communication failures via proximity graph , 2011, Autom..

[4]  Daniel Kroening,et al.  Towards Verifiable and Safe Model-Free Reinforcement Learning , 2019, OVERLAY@AI*IA.

[5]  David Portugal,et al.  Cooperative multi-robot patrol with Bayesian learning , 2016, Auton. Robots.

[6]  Osbert Bastani,et al.  MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding , 2019, ArXiv.

[7]  David Baran,et al.  Application of Multi-Robot Systems to Disaster-Relief Scenarios with Limited Communication , 2015, FSR.

[8]  Robert Bogue,et al.  Robots in the nuclear industry: a review of technologies and applications , 2011, Ind. Robot.

[9]  Radu Calinescu,et al.  Synthesis of probabilistic models for quality-of-service software engineering , 2018, Automated Software Engineering.

[10]  Hongyi Zhou,et al.  MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Shuyue Hu,et al.  A Q-values Sharing Framework for Multiple Independent Q-learners , 2019, AAMAS.

[12]  Sebastian Junges,et al.  A Storm is Coming: A Modern Probabilistic Model Checker , 2017, CAV.

[13]  Radu Calinescu,et al.  Assured Reinforcement Learning with Formally Verified Abstract Policies , 2017, ICAART.

[14]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[15]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[16]  Nils Jansen,et al.  Synthesis and Verification of Self-aware Computing Systems , 2017, Self-Aware Computing Systems.

[17]  Carlo Ghezzi,et al.  Self-adaptive software needs quantitative verification at runtime , 2012, CACM.

[18]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[19]  Jonathan Serrano-Cuevas,et al.  Safe reinforcement learning using risk mapping by similarity , 2020 .

[20]  Radu Calinescu,et al.  Assurance in Reinforcement Learning Using Quantitative Verification , 2018 .

[21]  Ah-Hwee Tan,et al.  Scaling Up Multi-agent Reinforcement Learning in Complex Domains , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[22]  David Portugal,et al.  A ROS-Based Framework for Simulation and Benchmarking of Multi-robot Patrolling Algorithms , 2018, Studies in Computational Intelligence.

[23]  Dorsa Sadigh,et al.  Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models , 2019, 2019 American Control Conference (ACC).

[24]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[25]  Calin Belta,et al.  Probabilistically Safe Vehicle Control in a Hostile Environment , 2011, ArXiv.

[26]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[27]  Danny Weyns,et al.  UNDERSEA: An Exemplar for Engineering Self-Adaptive Unmanned Underwater Vehicles , 2017, 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS).

[28]  Frank Ciesinski,et al.  On Probabilistic Computation Tree Logic , 2004, Validation of Stochastic Systems.

[29]  M. Kwiatkowska Quantitative verification: models, techniques and tools , 2007, ESEC-FSE companion '07.

[30]  J. Burdick,et al.  Safe Multi-Agent Interaction through Robust Control Barrier Functions with Learned Uncertainties , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[31]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[32]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[33]  Gethin Norman,et al.  Quantitative Verification: Formal Guarantees for Timeliness, Reliability and Performance , 2014 .

[34]  Radu Calinescu,et al.  Efficient synthesis of robust models for stochastic systems , 2018, J. Syst. Softw..

[35]  Norman Carver,et al.  Tuning computer gaming agents using Q-learning , 2011, 2011 Federated Conference on Computer Science and Information Systems (FedCSIS).