Safe exploration for reinforcement learning

In this paper we define and address the problem of safe exploration in the context of reinforcement learning. Our notion of safety is concerned with states or transitions that can lead to damage and thus must be avoided. We introduce the concepts of a safety function for determining a state’s safety degree and that of a backup policy that is able to lead the system under control from a critical state back to a safe one. Moreover, we present a level-based exploration scheme that is able to generate a comprehensive base of observations while adhering safety constraints.We evaluate our approach on a simplified simulation of a gas turbine.