论文信息 - A formal methods approach to interpretable reinforcement learning for robotic planning

A formal methods approach to interpretable reinforcement learning for robotic planning

A formal methods approach to reinforcement learning generates rewards from a formal language and guarantees safety. Growing interest in reinforcement learning approaches to robotic planning and control raises concerns of predictability and safety of robot behaviors realized solely through learned control policies. In addition, formally defining reward functions for complex tasks is challenging, and faulty rewards are prone to exploitation by the learning agent. Here, we propose a formal methods approach to reinforcement learning that (i) provides a formal specification language that integrates high-level, rich, task specifications with a priori, domain-specific knowledge; (ii) makes the reward generation process easily interpretable; (iii) guides the policy generation process according to the specification; and (iv) guarantees the satisfaction of the (critical) safety component of the specification. The main ingredients of our computational framework are a predicate temporal logic specifically tailored for robotic tasks and an automaton-guided, safe reinforcement learning algorithm based on control barrier functions. Although the proposed framework is quite general, we motivate it and illustrate it experimentally for a robotic cooking task, in which two manipulators worked together to make hot dogs.

[1] Cuntai Guan,et al. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[2] Surya P. N. Singh,et al. V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[4] Scott Sanner,et al. Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping , 2021, SOCS.

[5] Craig Boutilier,et al. Structured Solution Methods for Non-Markovian Decision Processes , 1997, AAAI/IAAI.

[6] Armando Solar-Lezama,et al. Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[7] Dejan Nickovic,et al. Monitoring Temporal Properties of Continuous Signals , 2004, FORMATS/FTRTFT.

[8] Gábor Orosz,et al. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[9] John K. Slaney,et al. Decision-Theoretic Planning with non-Markovian Rewards , 2011, J. Artif. Intell. Res..

[10] Matthias Scheutz,et al. Value Alignment or Misalignment - What Will Keep Systems Accountable? , 2017, AAAI Workshops.

[11] Christel Baier,et al. Principles of model checking , 2008 .

[12] Giuseppe De Giacomo,et al. Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications , 2018, ICAPS.

[13] Dario Amodei,et al. Supervising strong learners by amplifying weak experts , 2018, ArXiv.

[14] Calin Belta,et al. Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15] Koushil Sreenath,et al. Discrete Control Barrier Functions for Safety-Critical Control of Discrete Systems with Application to Bipedal Robot Navigation , 2017, Robotics: Science and Systems.

[16] Sven Schewe,et al. Omega-Regular Objectives in Model-Free Reinforcement Learning , 2018, TACAS.

[17] Shane Legg,et al. Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.

[18] Radu Calinescu,et al. Assured Reinforcement Learning with Formally Verified Abstract Policies , 2017, ICAART.

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[21] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[22] Jyotirmoy V. Deshmukh,et al. Structured reward functions using STL: poster abstract , 2019, HSCC.

[23] Michael M. Zavlanos,et al. Reduced variance deep reinforcement learning with temporal logic specifications , 2019, ICCPS.

[24] Calin Belta,et al. Receding horizon surveillance with temporal logic specifications , 2010, 49th IEEE Conference on Decision and Control (CDC).

[25] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.

[26] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.