论文信息 - SHERPA: a safe exploration algorithm for Reinforcement Learning controllers

SHERPA: a safe exploration algorithm for Reinforcement Learning controllers

The problem of an agent exploring an unknown environment under limited prediction capabilities is considered in the scope of using a reinforcement learning controller. We show how this problem can be handled by the Safety Handling Exploration with Risk Perception Algorithm (SHERPA) that relies on interval estimation of the dynamics of the agent during the exploration phase along with limited capability from the agent to perceive the presence of incoming fatal instances. An application to a simple quadrotor model is included to show the algorithm performance.

Erik-Jan Van Kampen | Tommaso Mannucci | Coen C. de Visser | Q Ping Chu

[1] Ramon E. Moore. Interval arithmetic and automatic error analysis in digital computing , 1963 .

[2] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.

[3] Claire J. Tomlin,et al. Design of guaranteed safe maneuvers using reachable sets: Autonomous quadrotor aerobatics in theory and practice , 2010, 2010 IEEE International Conference on Robotics and Automation.

[4] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.

[5] Steven I. Marcus,et al. Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..

[6] Hajime Asama,et al. Inevitable collision states. A step towards safer robots? , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[7] Robert F. Stengel,et al. Online Adaptive Critic Flight Control , 2004 .

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] Francisco Javier García-Polo,et al. Safe reinforcement learning in high-risk tasks through policy improvement , 2011, ADPRL.

[10] Claire J. Tomlin,et al. Guaranteed safe online learning of a bounded system , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11] J. A. Mulder,et al. Continuous Adaptive Critic Flight Control aided with Approximated Plant Dynamics , 2006 .