Towards Verification and Validation of Reinforcement Learning in Safety-Critical Systems A Position Paper from the Aerospace Industry

Reinforcement learning techniques have successfully been applied to solve challenging problems. Among the more famous examples are playing games such as Go and real-time computer games such as StarCraft II. In addition, reinforcement learning has successfully been deployed in cyberphysical systems such as robots playing a curlingbased game. These are all important and significant achievements indicating that the techniques can be of value for the aerospace industry. However, to use these techniques in the aerospace industry, very high requirements on verification and validation must be met. In this position paper, we outline four key problems for verification and validation of reinforcement learning techniques. Solving these are an important step towards enabling reinforcement learning techniques to be used in safety critical domains such as the aerospace industry.

[1]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[2]  Hoyt Lougee,et al.  SOFTWARE CONSIDERATIONS IN AIRBORNE SYSTEMS AND EQUIPMENT CERTIFICATION , 2001 .

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Marco Aiello,et al.  AAAI Conference on Artificial Intelligence , 2011, AAAI Conference on Artificial Intelligence.

[5]  André Platzer,et al.  ModelPlex: verified runtime validation of verified cyber-physical system models , 2014, Formal Methods in System Design.

[6]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[7]  Sebastian Junges,et al.  Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.

[8]  김태식,et al.  RTCA-DO-178C 인증규격에 따른 소프트웨어 형상관리계획 수립에 관한 연구 , 2016 .

[9]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10]  Nathan Fulton,et al.  Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning , 2018, AAAI.

[11]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[12]  Mykel J. Kochenderfer,et al.  Reinforcement Learning with Probabilistic Guarantees for Autonomous Driving , 2019, ArXiv.

[13]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[14]  Nathan Fulton,et al.  Verifiably Safe Off-Model Reinforcement Learning , 2019, TACAS.

[15]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[16]  Radu Grosu,et al.  Neural Simplex Architecture , 2019, NFM.

[17]  Sehoon Ha,et al.  Learning to be Safe: Deep RL with a Safety Critic , 2020, ArXiv.

[18]  K. Müller,et al.  An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions , 2020, Science Robotics.

[19]  Nicholas Rhinehart,et al.  Conservative Safety Critics for Exploration , 2020, ICLR.

[20]  Nils Jansen,et al.  Adaptive Shielding under Uncertainty , 2020, 2021 American Control Conference (ACC).