Safety-Aware Apprenticeship Learning

Apprenticeship learning (AL) is a class of "learning from demonstrations" techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learned policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.

[1]  Mahesh Viswanathan,et al.  A counterexample-guided abstraction-refinement framework for markov decision processes , 2008, TOCL.

[2]  Pieter Abbeel,et al.  Probabilistically safe policy transfer , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Claire J. Tomlin,et al.  Guaranteed safe online learning of a bounded system , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  R. Bellman A Markovian Decision Process , 1957 .

[5]  Nils Jansen,et al.  Minimal Critical Subsystems for Discrete-Time Markov Models , 2012, TACAS.

[6]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[7]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[8]  Radu Calinescu,et al.  Assured Reinforcement Learning for Safety-Critical Applications , 2017 .

[9]  Alberto L. Sangiovanni-Vincentelli,et al.  Polynomial-Time Verification of PCTL Properties of MDPs with Convex Uncertainties , 2013, CAV.

[10]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[11]  Bengt Jonsson,et al.  A logic for reasoning about time and reliability , 1990, Formal Aspects of Computing.

[12]  S. Shankar Sastry,et al.  A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[13]  Marta Z. Kwiatkowska,et al.  Automated Verification and Strategy Synthesis for Probabilistic Systems , 2013, ATVA.

[14]  Joost-Pieter Katoen,et al.  Counterexample Generation in Probabilistic Model Checking , 2009, IEEE Transactions on Software Engineering.

[15]  Sebastian Junges,et al.  Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.

[16]  Nils Jansen,et al.  The COMICS Tool - Computing Minimal Counterexamples for DTMCs , 2012, ATVA.

[17]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[18]  Marta Z. Kwiatkowska,et al.  PRISM-games: verification and strategy synthesis for stochastic multi-player games with multiple objectives , 2017, International Journal on Software Tools for Technology Transfer.

[19]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[20]  Nils Jansen,et al.  The COMICS Tool - Computing Minimal Counterexamples for Discrete-time Markov Chains , 2012, ArXiv.

[21]  Hans D. Mittelmann,et al.  Interior Point Methods for Second-Order Cone Programming and OR Applications , 2004, Comput. Optim. Appl..

[22]  Sanjit A. Seshia,et al.  A theory of formal synthesis via inductive learning , 2015, Acta Informatica.

[23]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[24]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[25]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[26]  Marta Z. Kwiatkowska,et al.  PRISM: Probabilistic Symbolic Model Checker , 2002, Computer Performance Evaluation / TOOLS.

[27]  Shimon Whiteson,et al.  Inverse Reinforcement Learning from Failure , 2016, AAMAS.