论文信息 - Safety-Aware Apprenticeship Learning

Safety-Aware Apprenticeship Learning

Apprenticeship learning (AL) is a class of "learning from demonstrations" techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learned policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.

Weichao Zhou | Wenchao Li

[1] Mahesh Viswanathan,et al. A counterexample-guided abstraction-refinement framework for markov decision processes , 2008, TOCL.

[2] Pieter Abbeel,et al. Probabilistically safe policy transfer , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3] Claire J. Tomlin,et al. Guaranteed safe online learning of a bounded system , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4] R. Bellman. A Markovian Decision Process , 1957 .

[5] Nils Jansen,et al. Minimal Critical Subsystems for Discrete-Time Markov Models , 2012, TACAS.

[6] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[7] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[8] Radu Calinescu,et al. Assured Reinforcement Learning for Safety-Critical Applications , 2017 .

[9] Alberto L. Sangiovanni-Vincentelli,et al. Polynomial-Time Verification of PCTL Properties of MDPs with Convex Uncertainties , 2013, CAV.

[10] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.

[11] Bengt Jonsson,et al. A logic for reasoning about time and reliability , 1990, Formal Aspects of Computing.

[12] S. Shankar Sastry,et al. A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[13] Marta Z. Kwiatkowska,et al. Automated Verification and Strategy Synthesis for Probabilistic Systems , 2013, ATVA.

[14] Joost-Pieter Katoen,et al. Counterexample Generation in Probabilistic Model Checking , 2009, IEEE Transactions on Software Engineering.

[15] Sebastian Junges,et al. Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.

[16] Nils Jansen,et al. The COMICS Tool - Computing Minimal Counterexamples for DTMCs , 2012, ATVA.

[17] Sanjit A. Seshia,et al. Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[18] Marta Z. Kwiatkowska,et al. PRISM-games: verification and strategy synthesis for stochastic multi-player games with multiple objectives , 2017, International Journal on Software Tools for Technology Transfer.

[19] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.

[20] Nils Jansen,et al. The COMICS Tool - Computing Minimal Counterexamples for Discrete-time Markov Chains , 2012, ArXiv.

[21] Hans D. Mittelmann,et al. Interior Point Methods for Second-Order Cone Programming and OR Applications , 2004, Comput. Optim. Appl..

[22] Sanjit A. Seshia,et al. A theory of formal synthesis via inductive learning , 2015, Acta Informatica.

[23] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[24] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[25] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[26] Marta Z. Kwiatkowska,et al. PRISM: Probabilistic Symbolic Model Checker , 2002, Computer Performance Evaluation / TOOLS.

[27] Shimon Whiteson,et al. Inverse Reinforcement Learning from Failure , 2016, AAMAS.