SHERPA: a safe exploration algorithm for Reinforcement Learning controllers

The problem of an agent exploring an unknown environment under limited prediction capabilities is considered in the scope of using a reinforcement learning controller. We show how this problem can be handled by the Safety Handling Exploration with Risk Perception Algorithm (SHERPA) that relies on interval estimation of the dynamics of the agent during the exploration phase along with limited capability from the agent to perceive the presence of incoming fatal instances. An application to a simple quadrotor model is included to show the algorithm performance.